Cross-entropy loss is used in classification problems, especially when multiple classes are involved. It measures the difference between the predicted probability distribution and the actual class labels (ground truth), where the goal is to minimize the difference so that predicted probabilities match the true labels as closely as possible.
Cross-entropy essentially penalizes predictions that deviate from the true labelβs probability (which should ideally be 1 for the correct class and 0 for the others)
For a single sample, the cross-entropy loss is
where:
- is the number of classes
- is a binary label (1 for the correct class, 0 for others)
- is the predicted probability for class (output of Softmax Activation Function)
Similarly, for samples, the average loss is:
Cross-entropy is useful because it is probabilistic in nature, so it operates directly on probabilities which is suitable for classification. Similarly, it strongly penalizes incorrect predictions with high confidence and has gradient behaviour where the correct classβs probability increases.