Quantization and Fairness: A Deep Dive into Disparate Impacts and Solutions

TLDR: Post-Training Quantization (PTQ), a common method for compressing neural networks, can unintentionally worsen fairness issues, especially for minority groups. This paper explains the underlying reasons, tracing the impact from changes in model weights and activations to altered logits, softmax probabilities, and a degraded optimization state. To counter these effects, the authors propose a combined approach using mixed-precision Quantization Aware Training (QAT) with dataset sampling and weighted loss functions, demonstrating improved fairness without significant accuracy loss.

In the rapidly evolving world of artificial intelligence, the demand for faster and lighter models, especially for devices at the ‘edge’ of networks, has led to the widespread adoption of compression techniques like quantization. One such method, Post Training Quantization (PTQ), is celebrated for its ability to significantly reduce model size and speed up computation with minimal impact on overall accuracy. However, recent research has unveiled a critical, often overlooked, side effect: PTQ can exacerbate disparate impacts, particularly for minority groups.

A new paper, titled EXPLAININGHOWQUANTIZATIONDISPARATELY SKEWS AMODEL, by Abhimanyu Bellam and Jung-Eun Kim from North Carolina State University, delves deep into the mechanisms behind this fairness degradation. Their work provides a comprehensive explanation of how quantization creates a chain of factors leading to unequal impacts across different groups during both the forward and backward passes of a neural network.

The Root of the Problem: How Quantization Skews Models

The researchers observed that as the precision of a model is reduced through quantization (e.g., from 32-bit to 2-bit integers), the disparity in accuracy between groups becomes increasingly pronounced. For instance, on datasets like UTKFace, minority groups showed extreme drops in accuracy when models were quantized to lower precisions.

The study identifies several cascaded factors in the forward pass that contribute to this disparity:

Changes in Weights: The fundamental alteration occurs in the network’s weights. Quantization not only changes the numerical values of these weights but also induces sparsity, effectively setting many weights to zero. This is akin to pruning and leads to a loss of information, with lower precision causing a greater absolute difference from the original weights and increased sparsity.
Impact on Logits and Probabilities: These weight changes ripple through the network, significantly affecting the ‘logits’ – the raw output values from the network before they are converted into probabilities. The numerical values of logits shift, potentially causing incorrect classifications. More critically, the variance among logits decreases, making it harder for the model to distinguish between different classes, especially for minority groups. This reduced variance then carries over to the ‘softmax probabilities,’ which represent the model’s confidence in its predictions. For minority groups, these probabilities tend to shift closer to the decision boundary, indicating lower confidence and increased uncertainty.
Increased Loss and Compromised Accuracy: The combined effect of these changes is a higher loss and significantly reduced accuracy for minority groups, directly reflecting the exacerbated disparity.

The Optimization Landscape: A Deeper Look at Unfairness

Beyond the forward pass, the paper also examines how quantization degrades the model’s state from an optimization perspective. Using gradient norms and eigenvalues of the Hessian matrix, the researchers provide insights into why quantized models struggle with fairness:

Gradient Norms: For minority classes, quantized models exhibit larger gradient norms. In simpler terms, this means the model is further away from an optimal solution for these groups, implying a greater need for updates to improve predictions. There’s an inverse relationship observed between gradient norm and group size, meaning smaller groups have larger gradient norms.
Hessian Eigenvalues: The largest eigenvalues of the Hessian matrix are also higher for minority groups. This indicates a steeper loss surface for these groups, suggesting that while there’s a greater potential for loss reduction with updates, the model is currently in a less stable or optimal position for them.

Also Read:

Towards Fair Quantization: Proposed Mitigation Strategies

To combat these adverse effects, Bellam and Kim propose a multi-pronged mitigation approach:

Fairer Base Model: Before quantization, the base model can be made fairer using dataset sampling methods (undersampling majority classes, oversampling minority classes) to address data imbalance. Additionally, a weighted cross-entropy loss function can be employed, assigning higher weights to ‘harder’ classes (often minority groups) to ensure the model doesn’t solely focus on easier samples.
Mixed-Precision Quantization Aware Training (QAT): Unlike PTQ, QAT involves retraining the model with quantized weights, allowing the network to adapt. The researchers specifically advocate for mixed-precision QAT, where critical layers (like the first and last) use higher precision (e.g., 8-bit) while others use lower, minimizing information loss where it matters most.
FairQAT: The most effective solution combines all these elements: dataset sampling, weighted loss functions, and mixed-precision QAT. This integrated approach significantly reduces the disparate impact of quantization, achieving both higher overall accuracy and lower fairness violation, offering a balanced trade-off for practical deployment.

This research sheds crucial light on the hidden fairness challenges posed by model compression techniques. By understanding the ‘how’ and ‘why’ of quantization’s disparate impact, and by implementing the proposed FairQAT strategies, developers can move towards deploying more equitable and high-performing AI models on edge devices and beyond.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Quantization and Fairness: A Deep Dive into Disparate Impacts and Solutions

The Root of the Problem: How Quantization Skews Models

The Optimization Landscape: A Deeper Look at Unfairness

Towards Fair Quantization: Proposed Mitigation Strategies

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates