Unlocking Deeper Meaning in AI: How Multilingual Averaging Enhances LLM Interpretability

TLDR: A new research paper introduces a method to better understand the internal concepts of Large Language Models (LLMs) by averaging their responses to the same concept translated into multiple languages. This ‘conceptual averaging’ technique, using Sparse Autoencoders, helps to filter out language-specific noise and isolate the core semantic meaning, leading to more accurate interpretations of how LLMs represent knowledge.

The research paper “Disentangling concept semantics via multilingual averaging in Sparse Autoencoders” by Cliff O’Reilly, Ernesto Jim´enez-Ruiz, and Tillman Weyde explores a novel approach to enhance the interpretability of Large Language Models (LLMs). LLMs, despite their impressive capabilities, often suffer from issues like hallucinations and reasoning errors. A promising way to address these shortcomings is by integrating LLMs with formal knowledge representation and reasoning systems, such as ontologies.

The core challenge in this integration lies in bridging the gap between textual information and formal semantics. Text embeddings, which are numerical representations of text, contain not only the meaning of concepts but also syntactic and language-specific details. This entanglement makes it difficult to isolate the pure semantic content.

The Multilingual Averaging Approach

The authors propose a method that aims to isolate concept semantics by leveraging multilingual averaging of concept activations derived from Sparse Autoencoders (SAEs). SAEs are unsupervised algorithms that help in breaking down complex latent representations within neural networks into more interpretable, sparse features or “concepts.” These concepts can then be correlated with human-understandable ideas.

Here’s how their method works: First, OWL ontology classes, which are formal representations of knowledge, are converted into English text. These English texts are then translated into French and Chinese. All three language versions (English, French, and Chinese) are fed as prompts into the Gemma 2B LLM. Using the open-source Gemma Scope suite of Sparse Autoencoders, the researchers obtain “concept activations” for each class in each language. These activations represent the internal states of the LLM related to the input.

The crucial step is the “conceptual average.” For each class, the concept activations from the different language versions are averaged. Concepts that are not shared across the English and translated sets are removed, resulting in a much smaller, more focused set of concepts. The hypothesis is that this averaging process suppresses language-specific and syntactic noise, leaving behind a purer representation of the underlying concept semantics.

Validating the Results

To validate their approach, the researchers correlated these conceptual averages with a “ground truth” mapping between ontology classes. They compared the similarity of concept activations (using Cosine Similarity) with pre-defined correct relationships between classes. When using only single-language prompts (English), the concept activations showed a weak correlation to the ground truth. However, the “conceptual average” derived from multilingual inputs demonstrated a significantly stronger correlation. This suggests that the multilingual averaging indeed helps in isolating the true semantic relationships.

For instance, the average correlation for summary texts improved from 0.09 (English only) to 0.39 (English/French average) and 0.33 (English/Chinese average). For verbose texts, the improvement was from 0.18 (English only) to 0.20 (English/French average) and 0.35 (English/Chinese average). The results indicate that the conceptual average aligns more closely with the true relationship between classes compared to using a single language. Interestingly, the Chinese translations often showed a slightly stronger correlation, which the authors suggest might be due to Chinese language tokens containing fewer syntactic elements compared to English.

Also Read:

Implications for AI Understanding

This research offers a promising new technique for “mechanistic interpretability,” which is the field of understanding the internal workings of neural networks. By enabling more accurate interpretation of internal network states, this method could lead to better integration of LLMs with formal semantic systems and potentially help address issues like hallucinations by grounding LLMs in more precise conceptual understanding. For more details, you can refer to the full research paper available at arXiv.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unlocking Deeper Meaning in AI: How Multilingual Averaging Enhances LLM Interpretability

The Multilingual Averaging Approach

Validating the Results

Implications for AI Understanding

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Cresta Introduces Four Major AI Innovations at Inaugural Wave Conference to Enhance Customer Experience

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates