TLDR: A study reveals that large language models (LLMs) struggle to accurately detect and interpret ableism across cultures, particularly in India. Western LLMs tend to overestimate ableist harm, while Indic LLMs underestimate it, often misinterpreting cultural nuances and overlooking intersectional biases. The research highlights that LLMs are less sensitive to ableism expressed in Hindi and fail to understand the differing perceptions of people with disabilities in India regarding microaggressions, pity, and the intersection of disability with gender, caste, and class. The findings call for a human-centered, culturally grounded approach to developing AI systems for harm detection.
A new research paper delves into a critical issue in the world of artificial intelligence: how large language models (LLMs) understand and address ableism, particularly across different cultures. The study, titled Disability Across Cultures: A Human-Centered Audit of Ableism in Western and Indic LLMs, highlights a significant gap in how these powerful AI systems perceive harm against people with disabilities (PwD), especially in non-Western contexts like India.
People with disabilities globally, and particularly in India, face high levels of discrimination and hate online. While LLMs are increasingly used to combat online hate, most research has focused on Western audiences and Western AI models. This raises a crucial question: are these models truly equipped to recognize ableist harm in diverse cultural settings, and do localized models perform any better?
How the Study Was Conducted
To investigate these questions, researchers adopted and translated a publicly available dataset of ableist speech into Hindi, including both informal and formal registers. They then prompted eight different LLMs—four developed in the U.S. (GPT-4, Gemini, Claude, Llama) and four developed in India (Krutrim, Nanda, Gajendra, Airavata)—to score and explain the level of ableism and toxicity in these comments on a scale of 0 to 10. In parallel, 175 people with disabilities from both the U.S. and India performed the same task, providing a human-centered benchmark for comparison.
Human Perceptions of Ableism: A Cultural Divide
The study revealed stark differences in how PwD in the U.S. and India interpreted ableism. Indian PwD generally rated toxicity and ableism higher than their U.S. counterparts. While U.S. participants often distinguished clearly between general toxicity and ableism, Indian PwD tended to focus more on the emotional harm inflicted by comments. Interestingly, microaggressive ableism, such as comments like “IT’S AMAZING HOW POSITIVE YOU ARE!”, were often perceived as highly ableist and patronizing by U.S. PwD, but were interpreted positively as encouragement by Indian PwD. This highlights differing cultural expectations around support and motivation.
AI Models’ Performance: Overestimation and Underestimation
The research found a significant misalignment between LLMs and human perceptions, especially those of Indian PwD. Western LLMs consistently overestimated ableist harm, often flagging comments as highly offensive that Indian PwD considered benign or even positive. For example, a comment about attending a charity for disability, seen as “inspiration porn” by Western LLMs, was viewed as “positive” and “motivating” by Indian PwD.
Conversely, Indic LLMs consistently underestimated ableist harm. They frequently failed to detect harmful stereotypes, misinterpreted ableist comments, or even dismissed invisible disabilities like depression and autism as not being “real” disabilities. This under-sensitivity means harmful content could remain unchecked on platforms.
The study also explored the impact of demographic prompting, where models were explicitly told to consider the Indian context. While most Western LLMs showed little change, some Indic models, like Nanda, actually became less sensitive to ableism when the Indian context was introduced, contradicting how Indian PwD perceived such harm.
Ableism in Hindi: A Lingual Blind Spot for AI
A crucial finding was the LLMs’ performance with Hindi language. While Indian PwD rated harm consistently across English and Hindi, Western LLMs rated toxicity and ableism significantly lower in Hindi. This suggests that these models are more tolerant of ableist content when it’s expressed in Hindi, potentially leaving Hindi-speaking PwD more vulnerable to harm.
Furthermore, the nuances of Hindi formality registers (casual vs. formal language) posed a challenge. Indian PwD often interpreted casual Hindi as more intimate and caring, even for potentially intrusive questions. However, LLMs frequently interpreted casual Hindi as more harmful or disrespectful, revealing a deep disconnect from local social norms.
Beyond the West: Unique Cultural Nuances
The explanations provided by Indian PwD highlighted unique cultural attitudes that LLMs failed to capture. Indian PwD expressed a strong aversion to pity, often reframing deeply ableist remarks through a lens of strength and resilience. They also described intense social pressure to appear “normal” and were perplexed by stereotypes accusing disabled people of faking their conditions.
The study also revealed how ableism in India intersects with other systemic inequalities like gender, caste, and class. Comments about reproductive health were particularly harmful to women with PCOS, and remarks about veganism were layered with assumptions about religion and economic privilege. These intersectional biases were entirely missed by the LLMs.
Also Read:
- Unmasking Hidden Biases: The Subtle Ways Language Models Change Facts
- AI’s Unequal Narratives: How Language Models Constrain Queer Stories
The Path Forward: Culturally Grounded AI
The findings underscore a significant cultural misalignment in AI systems designed for content moderation. Western LLMs, often trained on U.S.-centric data, may over-censor legitimate disability advocacy in other cultures, while Indic LLMs’ under-sensitivity allows harmful content to persist. The paper argues against a universal standard for ableism recognition, asserting that harm must be assessed through the lens of local values and lived experiences. It calls for a shift towards culturally grounded harm detection, emphasizing the need for researchers to collaborate with diverse end-users to build truly inclusive AI systems.


