TLDR: Researchers have developed a new ‘Linearly Adaptive Cross Entropy Loss function’ that improves machine learning classification performance. Unlike standard cross-entropy, this new function includes an additional term based on the true class’s predicted probability, enhancing the optimization process. Tested on a ResNet model with the CIFAR-100 dataset, it consistently achieved higher accuracy and lower error rates with minimal additional computational cost, offering a promising alternative for classification tasks.
In the realm of machine learning, particularly for classification tasks, the choice of a loss function is paramount. It guides the model during training, helping it learn to make accurate predictions. One of the most widely used loss functions is Cross Entropy, which has its roots in information theory and measures the dissimilarity between a model’s predicted probability distribution and the true distribution.
While effective, the standard Cross Entropy loss primarily focuses on increasing the predicted probability of the correct class. This implicitly reduces the probabilities of incorrect classes, but it doesn’t directly leverage information from these “false” classes during the learning process. This is where a new approach, the Linearly Adaptive Cross Entropy Loss function, steps in.
Proposed by Jae Wan Shim, this novel loss function introduces an additional term that specifically depends on the predicted probability of the true class. This unique feature is designed to enhance the optimization process, especially when dealing with one-hot encoded class labels, a common representation where only the true class is marked with a ‘1’ and all others with a ‘0’. The theoretical foundation for this new function is derived from fundamental concepts in information theory, specifically building upon the symmetric Kullback-Leibler divergence, also known as Jeffreys divergence.
To evaluate its effectiveness, the Linearly Adaptive Cross Entropy Loss function was put to the test against the conventional Cross Entropy loss. The experiments were conducted using a deep learning model based on the ResNet (Residual Network) architecture, a popular choice for image classification tasks known for its ability to handle very deep networks and mitigate issues like vanishing gradients. The model, consisting of 18 layers, was trained on the CIFAR-100 dataset, which comprises 100 classes of 32×32 color images.
The training process involved several key hyperparameters and techniques to ensure robust evaluation. Stochastic Gradient Descent (SGD) was used as the optimizer with a learning rate of 0.1, momentum of 0.9, and weight decay of 5e-4. A StepLR scheduler adjusted the learning rate, decaying it by a factor of 0.1 every 50 epochs. The models were trained for 200 epochs with a batch size of 100. Data augmentation, including random horizontal flips and random cropping, was applied to the training images to improve generalization, and per-pixel mean subtraction was used for preprocessing.
The results were compelling. Across multiple training iterations, the Linearly Adaptive Loss function consistently outperformed the standard Cross Entropy Loss function in terms of classification accuracy. For instance, when comparing the top-5 error rates (the percentage of times the true class was not among the top 5 predicted classes), the Linearly Adaptive Loss achieved a lower mean error rate of 6.2% compared to Cross Entropy’s 6.7%. This indicates that the proposed function leads to more accurate predictions.
Crucially, this enhanced performance comes with minimal additional computational cost. The Linearly Adaptive Loss function only requires two extra operations—one subtraction and one multiplication—compared to the standard Cross Entropy loss. This means it maintains practically the same efficiency, making it a highly practical alternative for real-world applications.
The findings suggest that this linearly adaptive approach could significantly broaden the scope for future research into loss function design. Future work could delve deeper into the theoretical analysis of its convergence properties, explore its impact on model robustness against adversarial attacks (small, intentional perturbations to input data that can trick models), and investigate its potential for multi-label classification tasks, where an input can belong to multiple classes simultaneously.
Also Read:
- Enhancing Data Privacy in Machine Learning with Focal Entropy
- Smarter Data Pruning: Accounting for Class-Specific Learning Challenges
For more detailed information, you can refer to the full research paper available at arXiv:2507.10574.


