what is softmax activation function

1 year ago 99
Nature

The softmax activation function is a mathematical function that is used to transform the raw outputs of a neural network into a probability distribution over predicted output classes. It is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luces choice axiom. The softmax function is a generalization of the logistic function to multiple dimensions and is used in multinomial logistic regression. The softmax activation function is used in multi-class classification problems where class membership is required on more than two class labels. The key features of the softmax activation function are:

  • It converts a vector of K real numbers into a probability distribution of K possible outcomes.
  • It returns an output vector that is N entries long, with the entry at index i corresponding to the probability of a particular input belonging to the class i.
  • It is a softened version of the argmax function that returns the index of the largest value in a list.
  • It scales numbers/logits into probabilities.
  • It assigns decimal probabilities to each class in a multi-class problem, and those decimal probabilities must add up to 1.0.

The softmax activation function is defined mathematically as follows:

$$\sigma(z)j = \frac{e^{z_j}}{\sum{k=1}^{K} e^{z_k}}$$

where $z$ is the vector of raw outputs from the neural network, $e$ is the value of Eulers number, and $\sigma(z)_j$ is the predicted probability of the test input belonging to class $j$ .