TLDR: BOOST is a novel AI method that uses out-of-distribution-informed adaptive sampling to mitigate bias in convolutional neural networks for painting classification. It dynamically adjusts sampling probabilities and temperature scaling to promote equitable representation of all classes, especially underrepresented ones. Evaluated on KaoKore and PACS datasets, BOOST significantly improves accuracy and F1 score while reducing class-wise bias, demonstrating a balance between performance and fairness. It also introduces a new metric, SODC, for assessing bias.
Artificial intelligence is increasingly used to analyze and categorize paintings, offering new insights for art appraisers and enthusiasts. However, a significant challenge in these AI systems is bias, which often arises from imbalanced datasets. When certain artistic styles or subjects dominate the training data, AI models become less accurate in recognizing rarer or less common artworks. This issue can compromise the fairness and precision of predictions, especially when dealing with data the model hasn’t seen much of before, known as out-of-distribution (OOD) data.
Addressing this critical problem, researchers have introduced a novel method called BOOST, which stands for Bias-Oriented OOD Sampling and Tuning. This approach aims to create more robust AI models for art classification by actively mitigating biases present in the training data. BOOST dynamically adjusts how the model learns, ensuring a more balanced representation of all artistic styles and classes, even those that are underrepresented.
How BOOST Works
The core idea behind BOOST is to intelligently select and prioritize training examples. It uses an “out-of-distribution-informed” sampling method, meaning it pays special attention to data points that are either rare or difficult for the model to classify. By scaling prediction scores and adjusting sampling probabilities, BOOST helps to bring rare and ambiguous samples closer to common ones, promoting a more inclusive learning process.
The method works by identifying challenging or ambiguous samples based on how close they are to the model’s decision boundaries. It then uses a technique called “temperature-scaled softmax” to calibrate prediction confidences. Imagine a thermometer for confidence: a higher “temperature” makes the model’s predictions softer, allowing it to consider a broader range of artistic styles, especially those that are ambiguous or stylistically similar. This helps prevent the model from becoming overconfident in its predictions for dominant classes.
BOOST also introduces small, calculated “perturbations” to the input images. These tiny changes are designed to push the image’s features closer to its true class representation, making the model better at distinguishing between correctly classified (in-distribution) and misclassified (out-of-distribution) samples. Crucially, BOOST prioritizes these rarer and harder-to-classify examples by inverting their sampling probabilities, ensuring they contribute more to the learning process, particularly at the beginning of training.
Measuring Bias with SODC
To effectively evaluate how well BOOST reduces bias, the researchers proposed a new metric: the Same-Dataset OOD Detection Score (SODC). This score helps assess how well different classes are separated and how much per-class bias is reduced. Essentially, it redefines correctly classified samples as “in-distribution” and misclassified ones as “out-of-distribution” within the same dataset, providing a clear measure of the model’s ability to handle ambiguity and imbalance.
Also Read:
- New Method Enhances Dataset Distillation by Focusing on Image Difficulty
- Advancing AI Adaptation Without Source Data: A New Approach with Multi-view Contrastive Learning
Experimental Success
The effectiveness of BOOST was tested on two datasets: KaoKore, which features cropped facial expressions from Japanese art with a class imbalance towards noble and warrior faces, and PACS, containing images of objects, humans, and animals across various artistic styles. The results were highly promising. BOOST consistently outperformed traditional sampling methods (like random or stratified sampling) across key performance metrics such as accuracy, F1 score, recall, and precision.
Notably, BOOST significantly reduced both the Mean Absolute Bias (MAB) and Standard Deviation of Bias (SDB). A lower MAB means the model performs more consistently across all classes, while a lower SDB indicates less variability in performance from one class to another. This demonstrates that BOOST not only improves overall classification performance but also ensures greater fairness by preventing the model from favoring or penalizing specific artistic styles or subjects.
Qualitative analyses, using visualizations like UMAP, further supported these findings. Models trained with BOOST showed tighter clustering within classes and less overlap between different classes, indicating clearer decision boundaries and fewer ambiguous samples. The visualizations also highlighted how BOOST effectively targets and learns from challenging, highly stylized, or visually atypical examples within individual class distributions, making the model more robust to diverse artistic expressions.
In conclusion, BOOST offers a robust solution for debiasing AI models in the art domain, balancing high performance with fairness. For more technical details, you can read the full research paper here.


