TLDR: A new research paper introduces ‘multiclass local calibration’ and a method called LoCal Nets (LCNs) to improve the trustworthiness of Machine Learning models. LCNs address ‘proximity bias,’ where predictions in sparse data regions are often miscalibrated. By learning new feature representations and using the Jensen-Shannon distance, LCNs align predicted probabilities with local class frequencies. Experiments show LCNs significantly outperform existing methods in local calibration, maintain competitive global calibration, and even enhance predictive performance, offering a more reliable approach for high-stakes applications.
In the rapidly evolving world of Machine Learning (ML), building models that are not only accurate but also trustworthy is paramount. A key aspect of trustworthiness is ‘calibration,’ which means that a model’s predicted probabilities should accurately reflect the true likelihood of an event. For instance, if a model predicts a 70% chance of rain, it should indeed rain about 70% of the times such a prediction is made. This is especially critical in high-stakes fields like healthcare, where miscalibrated predictions can lead to biased or incorrect decisions.
While traditional calibration methods have focused on overall model performance, they often overlook a crucial issue known as ‘proximity bias.’ This bias occurs when predictions for instances in sparsely populated or less common regions of the data are systematically miscalibrated. Imagine a medical diagnosis model that performs well for common patient profiles but struggles with rare conditions, leading to unreliable predictions for those most at risk. Existing multiclass calibration techniques haven’t adequately addressed this spatial aspect, leaving a significant gap in model trustworthiness.
A new research paper, “Multiclass Local Calibration With the Jensen-Shannon Distance,” introduces a novel approach to tackle this challenge by focusing on ‘multiclass local calibration.’ This concept emphasizes that predictions for nearby data points should have similar label distributions, ensuring that calibration holds true not just globally, but also within specific neighborhoods of the data space.
Introducing LoCal Nets for Enhanced Calibration
The researchers propose a practical method called LoCal Nets (LCNs) designed to improve local calibration in Neural Networks. Unlike many existing post-hoc calibration techniques that merely rescale a model’s fixed outputs, LCNs take a more fundamental approach. They learn new, reduced-dimensionality feature representations and generate new, calibrated predictions simultaneously. This allows LCNs to reshape the underlying geometry of the data’s representation space, making it more aligned with true local class frequencies.
At the heart of LCNs is the use of the Jensen-Shannon distance. This mathematical tool helps to measure the similarity between the model’s predicted probabilities and local estimates of class frequencies. By minimizing this distance during training, LCNs ensure that the model’s predictions are consistent with what’s observed in the immediate vicinity of each data point. The method also includes a ‘similarity term’ that encourages data points with the same label to cluster together, further refining the feature space.
A significant advantage of LCNs is that while they use these sophisticated kernel-based estimates during training, they do not require them during inference. This means that once trained, LCNs maintain the efficiency of standard feed-forward neural networks, making them practical for real-world applications.
Also Read:
- Aligning LLMs with Diverse Human Preferences: A New Estimator’s Promise
- Advancing Visual Reasoning in AI with Latent Chain-of-Thought Models
Empirical Validation and Performance
The research team rigorously evaluated LCNs against several established calibration techniques across various multiclass datasets, including CIFAR-10, CIFAR-100, and TissueMNIST. The results were compelling:
- Global Calibration: LCNs achieved competitive performance on global calibration metrics like Expected Calibration Error (ECE) and Expected Cumulative Calibration Error (ECCE), often ranking second only to the current state-of-the-art Dirichlet Calibration (DC).
- Local Calibration: Crucially, LCNs consistently demonstrated superior performance on local calibration metrics, such as Local Calibration Error (LCE) and Maximum Local Calibration Error (MLCE). This highlights their effectiveness in addressing proximity bias and ensuring reliable predictions in sparse data regions.
- Predictive Performance: Beyond calibration, LCNs also showed tangible gains in predictive performance. They achieved the largest reductions in Negative Log-Likelihood (NLL) and even improved model accuracy across datasets. This is a notable advantage, as other calibration methods typically work only on existing model outputs and do not enhance the underlying predictions.
These findings underscore the importance of incorporating a ‘local’ perspective into calibration methods. By learning new feature representations, LCNs not only improve the trustworthiness of predictions in critical, underrepresented areas but also enhance the overall predictive quality of the model.
While LCNs represent a significant step forward, the authors acknowledge areas for future work, such as exploring adaptive kernel choices and extending the method to other types of machine learning models. For a deeper dive into the technical details, you can read the full research paper here: Multiclass Local Calibration With the Jensen-Shannon Distance.


