TLDR: A study found that large language models (LLMs) used in hiring evaluations consistently score Indian job interview transcripts lower than UK transcripts, even after anonymization. This disparity is linked to linguistic features like sentence complexity and lexical diversity, suggesting a bias towards Western communication styles. While name-based identity cues had minimal impact in controlled settings, the findings highlight the urgent need for culturally sensitive AI design and evaluation to prevent systematic disadvantages for non-Western candidates in global hiring.
As artificial intelligence (AI) increasingly shapes the modern hiring landscape, its potential to introduce and amplify biases has become a pressing concern. A recent study, “Invisible Filters: Cultural Bias in Hiring Evaluations Using Large Language Models,” delves into how large language models (LLMs) assess job interviews across different cultural contexts, specifically comparing job seekers from the UK and India.
The Growing Role of AI in Hiring
AI-powered tools are now widely adopted in recruitment, from initial screening to candidate evaluation. These systems, including those leveraging advanced LLMs like GPT-4o and Gemini, offer scalability and ease of use, making them attractive to companies. However, their growing influence raises critical questions about fairness, accountability, and cultural bias, especially since many LLMs are trained on data predominantly reflecting Western norms and values.
Uncovering Hidden Biases: The Study’s Approach
Researchers conducted a systematic analysis using two datasets of interview transcripts: 100 from UK job seekers and 100 from Indian job seekers. To isolate linguistic and semantic features, all identity-related entities (like names and cities) were anonymized. The LLMs were instructed to act as recruiters and score each transcript on four key job-relevant attributes: hireability, positive impression, self-promotion, and storytelling.
Beyond initial scoring, the study also examined the influence of linguistic features such as lexical diversity, sentence complexity, and readability. Furthermore, to test for name-based bias, controlled identity substitutions were performed within the Indian dataset, varying names by gender, caste, and region while keeping the content identical.
Key Findings: A Cultural Scoring Gap
The study revealed a consistent and significant scoring disparity: Indian transcripts received notably lower scores than UK transcripts across most evaluated dimensions, including hireability, positive impression, and storytelling. This gap persisted even after anonymization, suggesting that LLMs might inadvertently favor Western linguistic patterns and communication styles.
Linguistic analysis further illuminated these differences. The LLMs showed a nuanced preference for concise, semantically dense language. Transcripts with simpler, more readable language (higher Flesch Reading Ease scores) were rated less favorably, while those with longer sentences were also penalized. This suggests that LLMs might reward sophisticated vocabulary but penalize excessive sentence length, potentially disadvantaging speakers whose cultural communication styles involve more formal or elaborate sentence constructions, common in some forms of Indian English.
Interestingly, when it came to identity-based name cues, the study found minimal statistically significant effects. Varying names by gender, caste, and region within the Indian dataset did not consistently alter LLM evaluations. This indicates that names alone, without additional contextual signals, might not be sufficient to trigger bias in these controlled textual settings.
Implications for a Fairer Future in Hiring
These findings underscore a critical concern: if organizations rely heavily on AI evaluations for early-stage screening, qualified candidates from non-Western backgrounds could be systematically filtered out, leading to less diverse workforces. This not only raises ethical and legal red flags but also risks perpetuating global inequalities and missing out on valuable talent.
The research also highlights the potential for AI-driven interview coaching tools to inadvertently pressure candidates to adopt Western communication styles, potentially suppressing culturally rooted expressions. To counteract this, the study advocates for culturally sensitive design and accountability in AI-assisted hiring.
Also Read:
- Unmasking Hidden Biases: How LLMs Infer Demographics from Disability Cues
- Uncovering Hidden Biases: How LLMs Infer Demographics from Neutral Questions
Towards Culturally Aware AI
Mitigating cultural harms requires a rethinking of LLM development and deployment. This includes expanding training data to reflect diverse dialects and communication styles, co-creating evaluation criteria with input from non-Western recruiters, and routinely auditing model performance using region-specific benchmarks. The study also suggests incorporating “explainability” into AI evaluations, allowing recruiters to understand the reasoning behind scores and critically assess whether feedback reflects a genuine job-relevant issue or a cultural mismatch.
Ultimately, fostering culturally aware LLMs demands an expanded view of fairness that incorporates global voices and lived experiences. By involving diverse communities throughout the AI lifecycle, we can move towards more equitable and inclusive AI systems in high-stakes domains like hiring. You can read the full research paper here.


