Enhancing Cross-Lingual Abilities in LLMs with Code-Switching

TLDR: Code-Switching In-Context Learning (CSICL) is a novel prompting strategy that addresses the ‘translation barrier’ in large language models (LLMs). By gradually transitioning from a target non-English language to English within few-shot demonstrations and instructions, CSICL acts as a linguistic bridge, improving cross-lingual alignment. Experiments across 4 LLMs, 6 datasets, and 10 languages show CSICL consistently outperforms existing cross-lingual in-context learning baselines, with significant gains in both target and unseen languages, especially in low-resource settings and for translation and reasoning tasks, without requiring additional training.

Large Language Models (LLMs) have shown impressive capabilities across many languages, but a significant challenge remains: their tendency to rely on English as a foundational internal representation. This creates what researchers call a ‘translation barrier.’ When an LLM struggles to implicitly translate non-English input into English for reasoning, its performance in that non-English language can drop sharply. This limitation restricts how inclusive and effective LLM-based applications can be for diverse linguistic communities.

Existing methods for cross-lingual in-context learning (X-ICL) often use demonstrations in a single language, which can inadvertently reinforce this English-centric reliance rather than overcoming it. This is where a new approach, Code-Switching In-Context Learning (CSICL), comes into play.

Introducing Code-Switching In-Context Learning (CSICL)

CSICL is a straightforward yet powerful prompting strategy designed to help LLMs navigate this translation barrier. It works by progressively transitioning from a target language (the non-English language) to English within both the demonstrations provided to the model and the instructions given for the task. Essentially, CSICL explicitly guides the LLM’s reasoning process, acting as an ‘implicit linguistic bridge’ that improves how well different languages align within the model’s internal workings.

The core idea is to scaffold the reasoning process. Imagine an LLM being asked a question in Korean. Instead of just giving it Korean examples, CSICL provides examples that start in Korean, gradually introduce English words and phrases, and eventually transition to a fully English equivalent. This gradual shift encourages the LLM to align its cross-lingual representations directly, rather than solely depending on a hidden, often unreliable, internal translation step.

How CSICL Works in Practice

CSICL employs two main components:

Gradual Code-Switching Few-Shot Demonstrations: These are examples where a query in a target language (e.g., Korean) slowly incorporates English words and phrases, moving from 0% English to 100% English. This is done using a technique called inter-sentential code-switching, where the matrix language (the dominant language) is the target language, and the embedded language is English.
Gradual Translation Instruction: The LLM is explicitly instructed to follow a similar progressive translation process. It’s told to ‘gradually translate this non-English query into English, then think in English, and finally answer the question,’ explicitly showing its step-by-step translation before providing the answer in the target language.

For a visual representation of this process, you can refer to Figure 1 in the original research paper, which illustrates how a Korean query about the pituitary gland is gradually translated into English within the prompt. To learn more about the technical details and see the full paper, you can visit the research paper here: Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models.

Extensive Experiments and Promising Results

The researchers conducted a wide range of experiments to test CSICL’s effectiveness. They used four different state-of-the-art multilingual LLMs (including Qwen3-32B, deepseek-chat-v3.1, grok-4-fast, and Gemini 2.5 Flash), six diverse datasets, and ten languages. These experiments covered both knowledge-intensive tasks (like general knowledge and cultural knowledge) and reasoning-oriented tasks (like mathematical reasoning and medical question answering).

The results were consistently positive: CSICL significantly outperformed traditional X-ICL baselines. It achieved gains of 3.1 percentage points (p.p.) in target languages and 1.9 p.p. in unseen languages. The improvements were even more dramatic in low-resource language settings, with gains of 14.7% in target languages and 5.3% in unseen languages. This highlights CSICL’s practical value in scenarios where data is scarce.

An ablation study confirmed that both the gradual code-switching demonstrations and the gradual translation instruction are crucial for these improvements. Interestingly, transitioning from a target language to English was more effective than the reverse, supporting the idea that CSICL helps LLMs align with their English-centric latent space. The benefits of CSICL were also shown not to be merely due to providing more sentences in the demonstration, as it consistently outperformed paraphrased monolingual demonstrations.

CSICL showed particularly strong gains in machine translation tasks (6.8 p.p. in target languages) and reasoning-oriented tasks (average gains of 5.4 p.p.). This suggests that by encouraging LLMs to ‘think in English’ through gradual transitions, CSICL helps them leverage their strongest reasoning capabilities.

Also Read:

Moving Towards More Inclusive LLMs

The findings from this research establish code-switching as a principled and robust method for overcoming the translation barrier during LLM inference. By integrating code-switching into in-context learning, CSICL helps LLMs achieve more equitable and effective multilingual performance without requiring additional training or resources. This work opens up new avenues for research, framing language alternation not as a challenge, but as a valuable resource for bridging linguistic gaps and fostering truly inclusive multilingual AI systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Cross-Lingual Abilities in LLMs with Code-Switching

Introducing Code-Switching In-Context Learning (CSICL)

How CSICL Works in Practice

Extensive Experiments and Promising Results

Moving Towards More Inclusive LLMs

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates