What is a softmax?
A function that determines probabilities for each possible class in a multi-class classification model.
softmax explained in plain English
A function that determines probabilities for each possible class in a multi-class classification model. The probabilities add up to exactly 1.0. For example, the following table shows how softmax distributes various probabilities:
Softmax is also called full softmax. Contrast with candidate sampling.
where: - $\sigma_i$ is the output vector. Each element of the output vector specifies the probability of this element. The sum of all the elements in the output vector is 1.0. The output vector contains the same number of elements as the input vector, $z$. - $z$ is the input vector. Each element of the input vector contains a floating-point value. - $K$ is the number of elements in the input vector (and the output vector). For example, suppose the input vector is:
Example
Therefore, softmax calculates the denominator as follows:
The softmax probability of each element is therefore: So, the output vector is therefore:
People also read
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.
- full softmax
Synonym for softmax.
- generative model
Practically speaking, a model that does either of the following: - Creates (generates) new examples from the training dataset.
- gradient boosting
A training algorithm where weak models are trained to iteratively improve the quality (reduce the loss) of a strong model.
- Gradient Descent
The method by which a model gradually improves by making small adjustments after each mistake, moving toward better performance.
- input layer
The layer of a neural network that holds the feature vector.
- logistic regression
A type of regression model that predicts a probability.
- minimax loss
A loss function for generative adversarial networks, based on the cross-entropy between the distribution of generated data and real data.