Unmasking Hidden Biases: How LLMs Infer Demographics from Disability Cues

TLDR: A new study reveals that Large Language Models (LLMs) frequently make biased demographic inferences based on disability-related language in user queries. The research, which audited eight state-of-the-art LLMs, found that while overall response rates don’t change much with disability cues, the patterns of bias shift significantly. For instance, certain disabilities can lead to stronger female or lower-income associations, despite general default predictions. Larger models are more prone to making these definitive, potentially biased, guesses, and business domain context can often override the influence of disability. The study highlights the urgent need for disability-sensitive fairness audits and improved abstention mechanisms in LLMs to prevent the reinforcement of harmful stereotypes.

Large Language Models (LLMs) are increasingly used in applications that personalize experiences and provide support. However, these powerful AI systems can inadvertently infer sensitive demographic information about users, even when such details are not explicitly provided in a query. A recent study, titled Who’s Asking? Investigating Bias Through the Lens of Disability-Framed Queries in LLMs, delves into a critical, yet underexplored, area: how disability-related language in user prompts influences these demographic inferences and potentially amplifies stereotypes.

The research, conducted by Srikant Panda, Vishnu Hari, Kalpana Panda, Amit Agarwal, and Hitesh Laxmichand Patel, highlights that biases related to disability have received significantly less attention compared to gender or racial biases. This gap leaves LLMs vulnerable to reinforcing societal stigmas and ableist stereotypes, which can have tangible implications for individuals with disabilities.

Unpacking the Study’s Approach

To systematically investigate this issue, the researchers performed the first large-scale audit of disability-conditioned demographic bias across eight state-of-the-art instruction-tuned LLMs, ranging from 3 billion to 72 billion parameters. They used a carefully constructed dataset called AccessEval, which includes both neutral queries and ‘disability-aware’ queries. These disability-aware queries incorporated placeholders for nine different disability categories across six real-world business domains (Education, Finance, Healthcare, Hospitality, Media, and Technology).

The LLMs were prompted to predict five demographic attributes: gender, socioeconomic status, education, cultural background, and locality. The goal was to see if the presence of disability cues would shift these predictions and reveal underlying biases.

Key Findings: LLMs’ Propensity for Inference and Bias Shifts

The study revealed several significant findings:

High Response Rates: LLMs demonstrated a strong tendency to make definitive demographic guesses, responding in up to 97% of cases, often without clear justification. This indicates a minimal threshold for abstention, even when context is insufficient.
Model Size Matters: Larger models (70B+ parameters) were substantially more likely to respond to attribute inference prompts, exhibiting response rates near 90% even in disability contexts, compared to under 50% for smaller models. This suggests that scale alone does not mitigate the risk of biased outputs.
Disability Cues Shift Gender Predictions: While there was a general default skew towards predicting male gender in neutral queries, the introduction of disability context significantly altered this. For instance, sensory processing and cognitive disorders led to a notable drop in male predictions, shifting towards female associations. Conversely, learning disorders, neurological conditions, and speech impairments often prompted a shift back towards male predictions, mirroring real-world diagnostic patterns.
Income Stereotypes: Models predominantly associated users with higher socioeconomic status in neutral contexts. However, when disabilities like mobility impairments, speech conditions, or genetic/behavioral disorders were present, the bias shifted towards lower-income classifications. This suggests that LLMs internalize social stereotypes that differentiate between perceived ‘competent’ and ‘dependent’ disability types.
Fixed Cultural and Educational Assumptions: For cultural background, locality, and educational attainment, models consistently assumed Western, urban, and highly educated users across most query contexts, showing little sensitivity to disability cues. This points to an over-generalization rooted in training data that prioritizes dominant sociocultural profiles.
Domain Overrides Disability: The business domain context often had a more significant impact on demographic inferences than disability cues. For example, Technology, Healthcare, and Finance domains elicited strong male associations, aligning with occupational gender stereotypes. The Finance domain was also the only one where low-income predictions meaningfully increased, suggesting economic judgments are tied more closely to industry context.

Also Read:

Implications and Recommendations

These findings underscore that even advanced, instruction-tuned LLMs can conflate disability with unrelated attributes, risking the reinforcement of harmful stereotypes and ableism. Such misattributions can have serious consequences in real-world accessibility settings, affecting automated triage, educational access, or decision support for disabled users.

The researchers advocate for several crucial steps to address these issues: implementing disability-sensitive fairness audits, training models with counterfactual examples that decorrelate disability from demographic assumptions, and equipping LLMs with robust abstention strategies for ambiguous identity inferences. This work serves as a vital call to action for developing more equitable and disability-inclusive AI systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Hidden Biases: How LLMs Infer Demographics from Disability Cues

Unpacking the Study’s Approach

Key Findings: LLMs’ Propensity for Inference and Bias Shifts

Implications and Recommendations

Gen AI News and Updates

AI’s Hidden Costs: Gaps in Social Impact Reporting Revealed

Beyond Mirroring: How Large Language Models Invent New Social Biases

Ensuring AI Integrity: SMiLE Framework Now Handles Global Relational Properties

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates