TLDR: This research introduces CultureScope, a novel method to investigate how large language models (LLMs) internally process and represent cultural knowledge. It reveals that LLMs exhibit Western-dominance bias and cultural flattening, where less-documented cultures are overgeneralized through dominant ones. The study also finds that low-resource cultures are less prone to these biases due to a lack of internal knowledge, suggesting a need for different mitigation strategies.
Large Language Models, or LLMs, are becoming increasingly common in our daily lives, used across a wide array of cultural contexts. From answering questions about local customs to generating content for diverse audiences, these AI systems are expected to understand and respond appropriately to different cultures. However, a significant challenge arises because the knowledge LLMs acquire is largely shaped by the data they are trained on, which is often heavily skewed towards Western perspectives.
This imbalance leads to what researchers call “cultural biases” and “overgeneralization.” For instance, an LLM might give a plausible but generic answer about a leisure activity in a less-documented country, reflecting broad stereotypes rather than specific cultural nuances. Previous research has primarily focused on evaluating these biases by looking at the models’ outputs – what they say or generate. But this approach doesn’t reveal *how* these biases are formed internally within the AI’s complex mechanisms.
Introducing CultureScope: A Look Inside the AI’s Mind
To bridge this gap, a new research paper titled “Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models” introduces a groundbreaking method called CultureScope. This is the first approach to use “mechanistic interpretability” to probe the internal workings of LLMs, allowing researchers to understand the underlying cultural knowledge space that shapes the models’ responses. Instead of just observing what comes out, CultureScope helps us see what’s happening inside.
CultureScope operates in three main stages: inference, scoping-in, and filtering. First, the LLM processes an input and generates an answer. Then, CultureScope “scopes in” by examining the hidden representations – the internal data structures – that the LLM used to generate that answer. Finally, a filtering stage ensures that only truly culture-specific knowledge is extracted. This process helps to reveal the “cultural knowledge signature” for different countries, showing how cultural information is encoded and organized within the model.
Quantifying Cultural Flattening and Western Dominance
One of the key concepts introduced by the researchers is the “Cultural Flattening (CF) score.” This score quantifies the degree to which an LLM’s representation of one country’s culture has been homogenized or blended to resemble another’s, particularly more dominant cultures. It’s an asymmetric score, meaning it can show how Country A’s knowledge might be flattened towards Country B’s, but not necessarily vice-versa.
The study’s experimental results, using models like Llama-3.1, aya-expanse, and Qwen2.5, reveal significant findings. They show that LLMs indeed encode a “Western-dominance bias” and “cultural flattening” within their internal cultural knowledge space. This means that when an LLM struggles with a question about a less-documented culture, it often defaults to knowledge associated with more dominant, often Western, cultures.
Interestingly, the research found that low-resource cultures (those with less available training data) are less susceptible to cultural flattening. However, this isn’t necessarily good news for fairness. The reason for this reduced susceptibility appears to be a *lack* of cultural knowledge about these regions within the model’s parameters, rather than an improved ability to avoid bias. This suggests that LLMs simply don’t have enough information about these cultures to flatten them.
Also Read:
- Uncovering Hidden Biases: How AI Models Struggle with Intersectional Identities
- Steering Large Language Models Away From Bias: A New Approach to Safer AI
The Role of Attention and Future Directions
The researchers also delved into the LLMs’ “attention mechanisms,” which determine which parts of the input the model focuses on when generating a response. Their analysis showed that when LLMs make incorrect predictions, they tend to “over-attend” to tokens associated with Western and high-resource cultures. This indicates that the Western-dominance bias is deeply internalized within the models’ representations, even more so than cultural flattening.
This groundbreaking work provides a crucial foundation for future research. It highlights the need for tailored approaches to mitigate cultural biases in LLMs. For low-resource cultures, the focus might need to shift from just bias mitigation to actively acquiring and integrating more diverse cultural knowledge. For cultures that are frequently flattened, the challenge lies in disentangling these entangled representations to ensure LLMs can accurately reflect cultural nuances.
The code and data used for these experiments are publicly available, encouraging further exploration and development in this critical area of AI ethics and cultural understanding. You can find more details in the full research paper: Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models.


