spot_img
HomeResearch & DevelopmentUnlocking Deeper Meaning in AI: How Multilingual Averaging Enhances...

Unlocking Deeper Meaning in AI: How Multilingual Averaging Enhances LLM Interpretability

TLDR: A new research paper introduces a method to better understand the internal concepts of Large Language Models (LLMs) by averaging their responses to the same concept translated into multiple languages. This ‘conceptual averaging’ technique, using Sparse Autoencoders, helps to filter out language-specific noise and isolate the core semantic meaning, leading to more accurate interpretations of how LLMs represent knowledge.

The research paper “Disentangling concept semantics via multilingual averaging in Sparse Autoencoders” by Cliff O’Reilly, Ernesto Jim´enez-Ruiz, and Tillman Weyde explores a novel approach to enhance the interpretability of Large Language Models (LLMs). LLMs, despite their impressive capabilities, often suffer from issues like hallucinations and reasoning errors. A promising way to address these shortcomings is by integrating LLMs with formal knowledge representation and reasoning systems, such as ontologies.

The core challenge in this integration lies in bridging the gap between textual information and formal semantics. Text embeddings, which are numerical representations of text, contain not only the meaning of concepts but also syntactic and language-specific details. This entanglement makes it difficult to isolate the pure semantic content.

The Multilingual Averaging Approach

The authors propose a method that aims to isolate concept semantics by leveraging multilingual averaging of concept activations derived from Sparse Autoencoders (SAEs). SAEs are unsupervised algorithms that help in breaking down complex latent representations within neural networks into more interpretable, sparse features or “concepts.” These concepts can then be correlated with human-understandable ideas.

Here’s how their method works: First, OWL ontology classes, which are formal representations of knowledge, are converted into English text. These English texts are then translated into French and Chinese. All three language versions (English, French, and Chinese) are fed as prompts into the Gemma 2B LLM. Using the open-source Gemma Scope suite of Sparse Autoencoders, the researchers obtain “concept activations” for each class in each language. These activations represent the internal states of the LLM related to the input.

The crucial step is the “conceptual average.” For each class, the concept activations from the different language versions are averaged. Concepts that are not shared across the English and translated sets are removed, resulting in a much smaller, more focused set of concepts. The hypothesis is that this averaging process suppresses language-specific and syntactic noise, leaving behind a purer representation of the underlying concept semantics.

Validating the Results

To validate their approach, the researchers correlated these conceptual averages with a “ground truth” mapping between ontology classes. They compared the similarity of concept activations (using Cosine Similarity) with pre-defined correct relationships between classes. When using only single-language prompts (English), the concept activations showed a weak correlation to the ground truth. However, the “conceptual average” derived from multilingual inputs demonstrated a significantly stronger correlation. This suggests that the multilingual averaging indeed helps in isolating the true semantic relationships.

For instance, the average correlation for summary texts improved from 0.09 (English only) to 0.39 (English/French average) and 0.33 (English/Chinese average). For verbose texts, the improvement was from 0.18 (English only) to 0.20 (English/French average) and 0.35 (English/Chinese average). The results indicate that the conceptual average aligns more closely with the true relationship between classes compared to using a single language. Interestingly, the Chinese translations often showed a slightly stronger correlation, which the authors suggest might be due to Chinese language tokens containing fewer syntactic elements compared to English.

Also Read:

Implications for AI Understanding

This research offers a promising new technique for “mechanistic interpretability,” which is the field of understanding the internal workings of neural networks. By enabling more accurate interpretation of internal network states, this method could lead to better integration of LLMs with formal semantic systems and potentially help address issues like hallucinations by grounding LLMs in more precise conceptual understanding. For more details, you can refer to the full research paper available at arXiv.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -