We can learn it by likening it to logistic regression: 😋
Recall that logistic regression produces a decimal between 0 and 1.0. For example, a logistic regression output of 0.8 from an email classifier suggests an 80% chance of an email being spam and a 20% chance of it being not spam. Clearly, the sum of the probabilities of an email being either spam or not spam is 1.0.
Softmax extends this idea into the MULTI-CLASS world. That is, Softmax assigns decimal probabilities to each class in a multi-class problem. Those decimal probabilities must add up to 1.0.
Its other name is Maximum Entropy (MaxEnt) Classifier
We can say that softmax regression generalizes logistic regression
Logistic regression is a special status of softmax where C = 2 🤔
C = number of classes = number of units of the output layer So, is a (C, 1) dimensional vector.
Softmax is implemented through a neural network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer.
Takes the output of softmax layer and convert it into 1 vs 0 vector (as I called it 🤭) which will be our ŷ
t = 0.13 ==> ̂y = 00.75 10.01 00.11 0
And so on 🐾
Y and ŷ are (C,m) dimensional matrices 👩🔧