A classification layer computes the cross entropy loss for
multi-class classification problems with mutually exclusive classes.
For typical classification networks, the classification layer must follow
the softmax layer. In the classification layer, trainNetwork
takes the
values from the softmax function and assigns each input to one of the K
mutually exclusive classes using the cross entropy function for a 1-of-K
coding scheme [1]:
where N is the number of samples, K
is the number of classes, is the indicator that the ith sample belongs to the
jth class, and is the output for sample i for class
j, which in this case, is the value from the softmax function. That
is, it is the probability that the network associates the ith input with
class j.