TLDR: A new method, BridgeX-ICL, improves cross-lingual performance in Large Language Models (LLMs) for low-resource languages without costly fine-tuning. It identifies ‘overlap neurons’ shared between languages using specially constructed probe data and employs an HSIC-based metric to select an optimal ‘bridge’ language for Cross-lingual In-Context Learning (X-ICL). Experiments show BridgeX-ICL significantly boosts performance on various language pairs and tasks, revealing that LLMs learn distinct linguistic relationships influenced by training data, often favoring English as a bridge.
Large Language Models (LLMs) have shown remarkable abilities across many languages, but they still face significant hurdles when it comes to performing well in low-resource languages. Traditional methods like fine-tuning are often too expensive and data-intensive. This challenge has led researchers to explore more data-efficient approaches, particularly Cross-lingual In-Context Learning (X-ICL), where LLMs learn from examples provided within the prompt itself.
A new study introduces a novel method called BridgeX-ICL, which aims to enhance zero-shot X-ICL for low-resource languages. This approach focuses on understanding how neurons within LLMs are shared across different languages and how this sharing can be leveraged to select the best ‘bridge’ language for cross-lingual transfer. Unlike previous work that often looked at language-specific neurons, BridgeX-ICL investigates the benefits of shared neurons.
The researchers identified two main limitations in existing neuron-based interpretations for low-resource languages: first, inaccurate neuron activation because LLMs might not fully understand the input, leading to unreliable activation patterns; and second, a lack of clear guidance on how to use internal neurons to improve cross-lingual transfer.
To overcome these issues, BridgeX-ICL employs a meticulous methodology. It starts by constructing specialized ‘neuron probe data’ using ground-truth bilingual dictionaries like MUSE. Instead of simply feeding word pairs, LLMs are prompted to generate translations in both directions (e.g., L1 to L2 and L2 to L1). This ensures that the neurons associated with both languages are fully and accurately activated, addressing the problem of inaccurate activation.
Next, the method identifies ‘overlap neurons’ – those neurons that are activated by multiple languages. The study found that similar languages tend to share more neurons than very different ones. For instance, Arabic and Hebrew, from the same language family, showed more overlapping neurons than Arabic and Swahili, which belong to different families. This suggests that the pattern of shared neurons can be a good indicator of linguistic distance.
The research also revealed that these overlap neurons are primarily concentrated in the middle and final layers of the LLM’s architecture. Middle-layer neurons appear to be crucial for semantic understanding, while final-layer neurons are more involved in generating cross-lingual outputs. This insight led the researchers to prioritize middle-layer neurons when measuring language similarity for bridge selection.
For selecting the optimal bridge language, BridgeX-ICL uses a metric called the Hilbert-Schmidt Independence Criterion (HSIC). This metric quantifies the non-linear dependency between the activation patterns of source-target overlap neurons and bridge-specific neurons. By calculating a selection probability for each candidate bridge language, the method identifies the most effective bridge to facilitate X-ICL.
Extensive experiments were conducted on two popular open-source LLMs, LLaMA-3-8B and Mistral-7B-Instruct-v0.3, across 2 cross-lingual tasks (Bilingual Lexicon Induction and Machine Reading Comprehension) and 15 language pairs from 7 diverse families. The focus was on improving performance for low-resource target languages like Hebrew, Tagalog, Swahili, and Japanese.
The findings were significant. The linguistic spectrum learned by LLMs, based on neuron overlaps, showed strong alignment with human linguistic taxonomy within language families. However, this alignment was not always consistent across different families, suggesting that LLMs develop their own unique understanding of language relationships, often influenced by their training data. For example, Arabic sometimes showed stronger neural similarity with French (a Romance language) than with Hebrew (an Afro-Asiatic language), likely due to the prevalence of English as a pivot language in training data.
BridgeX-ICL demonstrated its effectiveness, improving performance on the BLI task across 15 language pairs by an average of 6.02% and 5.25% over zero-shot baselines for LLaMA 3 and Mistral respectively. It also performed well on the MRC task. Interestingly, English was frequently selected as the optimal bridge language, highlighting its central role in LLMs’ cross-lingual transfer mechanisms.
Also Read:
- Unlocking LLM Potential: A Seed-Free Approach to Instruction Tuning
- Large Language Models for Identifying Entities in Historical Documents
This work represents a step forward in making LLMs more accessible and effective for low-resource languages without the need for costly fine-tuning. By understanding and leveraging the internal neuron overlap patterns, BridgeX-ICL offers a data-efficient strategy for improving cross-lingual capabilities. For more details, you can read the full research paper here.


