Unveiling Hidden Biases: A New Method Compares LLM and Human Implicit Associations

TLDR: A novel word association network methodology evaluates implicit biases in LLMs by simulating semantic priming, allowing direct comparison with human cognition. The study reveals both convergences and divergences in biases related to gender, religion, ethnicity, sexual orientation, and political party across humans and LLMs like Mistral, Llama3, and Haiku. This approach offers a transparent and scalable framework for understanding and addressing AI biases.

Large Language Models (LLMs) are becoming increasingly integrated into our daily lives, from writing emails to assisting in critical decision-making. However, a significant concern remains: their inherent social biases. These biases are often implicit, meaning they are subtle and not always obvious, making them challenging to detect and evaluate. A new research paper introduces a novel approach to tackle this problem by evaluating implicit biases in LLMs and directly comparing them to human biases. This methodology offers a systematic, scalable, and generalizable framework for understanding how LLMs align with human cognition.

The research, titled “A word association network methodology for evaluating implicit biases in LLMs compared to humans” by Katherine Abramski, Giulio Rossetti, and Massimo Stella, proposes a method based on simulating semantic priming within LLM-generated word association networks. This prompt-based technique delves into the hidden relational structures within LLMs, providing both quantitative measurements and qualitative insights into bias. Unlike many existing evaluation methods, this approach allows for direct comparisons between various LLMs (Mistral, Llama3, and Haiku were used in this study) and humans, offering a valuable benchmark.

Understanding the Methodology

The core of this innovative methodology lies in three main steps:

1. Network Construction: This involves building ‘word association networks’ from free association norms. For humans, the Small World of Words (SWOW) dataset is used, while for LLMs, the LLM World of Words (LWOW) dataset is employed. These networks represent the implicit knowledge of concepts held by both humans and machines. In these networks, words are nodes, and connections (edges) represent how strongly words are associated, based on how frequently one word is given as a response to another.

2. Spreading Activation: Once the networks are built, specific ‘prime nodes’ (words related to social identities like gender, religion, or political party) are activated. This simulates semantic priming, a cognitive phenomenon where exposure to one concept makes a related concept easier to access. As activation spreads through the network, the final activation levels of other words (target nodes) indicate how strongly they are associated with the initial prime. This process generates a matrix showing the strength of association between prime words and all other words in the network.

3. Bias Evaluation: The final step involves statistically analyzing these activation level matrices to identify patterns that reveal implicit biases. The methodology offers three different approaches: a ‘stereotypes approach’ for gender biases, a ‘valence approach’ for positive/negative perceptions of religion, ethnicity, and sexual orientation, and an ’emotions approach’ for feelings towards political parties. This step provides quantitative metrics and, for the stereotypes approach, a qualitative analysis of the ‘mindset streams’ – the conceptual paths taken from a prime to a target word.

Key Findings: Convergences and Divergences

The study applied this methodology to humans and three LLMs (Mistral, Llama3, and Haiku) across various social biases:

Gender Stereotypes: The research found significant gender stereotypes in both humans and all three LLMs. For instance, stronger associations were observed between stereotypical pairs like ‘feminine – compassionate’ and ‘masculine – forceful’. Humans exhibited the strongest overall gender bias, with equally strong female-related and male-related stereotypes. LLMs showed variations: Mistral and Haiku had stronger female-related stereotypes, while Llama3 showed stronger male-related stereotypes. The qualitative analysis of ‘mindset streams’ revealed that stereotype-consistent paths were generally shorter, indicating easier cognitive access. Interestingly, human mindset streams for stereotype-inconsistent pairs (e.g., ‘feminine – forceful’) often involved negative intermediate words, suggesting a negative perception of going against stereotypes, a nuance not observed in Haiku.

Valence (Religion, Ethnicity, Sexual Orientation): The valence analysis explored positive and negative perceptions. For religion and ethnicity, significant biases were found in humans and all LLMs. Humans perceived ‘Christian’ more positively and ‘Atheist’ and ‘Muslim’ more negatively, with ‘Jewish’ being neutral. In contrast, LLMs often perceived ‘Muslim’ more positively than ‘Christian’, suggesting potential fine-tuning efforts to counteract certain biases. Regarding ethnicity, humans showed ‘ingroup favoritism’ with ‘European’ perceived most positively and ‘African’ most negatively. LLMs mirrored this to varying degrees, with ‘African’ consistently viewed more negatively. For sexual orientation, humans showed almost no bias, but all three LLMs exhibited significant differences, often favoring ‘gay’ and ‘lesbian’ identities over ‘bisexual’. This could indicate an ‘over-correction’ effect in LLM alignment processes.

Emotions (Political Party): Significant divergences emerged in the emotional responses to political parties. Humans displayed strong emotional polarization, with negative emotions (anger, disgust, fear, sadness) directed more towards Republicans, and positive emotions (trust, joy) towards Democrats. LLMs, however, showed less emotional differentiation. Notably, all three LLMs expressed more trust towards Republicans, opposite to the human pattern. Haiku appeared the most politically neutral, while humans were the most ‘radical’ in their emotional differences.

Also Read:

Advantages and Societal Impact

This word association network methodology offers several advantages. It provides a flexible and scalable framework for implicit bias evaluation that doesn’t require access to the internal workings of LLMs, bridging the gap between intrinsic (model-level) and extrinsic (output-level) evaluations. Grounded in cognitive psychology, it mirrors how implicit biases are measured in humans, allowing for direct, quantitative, and qualitative comparisons. This transparency is crucial for identifying not just the presence of bias, but also the mechanisms of its propagation, which is vital for high-stakes applications like healthcare and hiring.

The societal implications are profound. By linking LLM associations to human cognitive structures, this method helps assess whether LLMs amplify existing human biases or introduce new ones, directly impacting trust and accountability in AI systems. It also supports monitoring bias evolution as models are updated or deployed in new cultural contexts, guiding the socially responsible development of AI.

For more detailed information, you can access the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Hidden Biases: A New Method Compares LLM and Human Implicit Associations

Understanding the Methodology

Key Findings: Convergences and Divergences

Advantages and Societal Impact

Gen AI News and Updates

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Morgan Freeman Condemns Unauthorized AI Voice Replication, Citing Theft of Identity and Work

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates