Unveiling Language Model Learning: Tracking Linguistic Feature Development During Pretraining

TLDR: This research introduces a novel method using ‘crosscoders’ and a metric called ‘Relative Indirect Effects (RELIE)’ to track how specific linguistic abilities emerge, evolve, and consolidate within large language models (LLMs) during their pretraining. It reveals that monolingual LLMs progress from detecting simple tokens to understanding complex grammatical patterns, while multilingual LLMs consolidate language-specific features into universal cross-lingual representations, with varying success depending on linguistic similarity and data representation.

Large language models (LLMs) are incredibly powerful, capable of understanding and generating human-like text. They learn complex abstractions during their pretraining, such as correctly identifying irregular plural nouns. However, a significant challenge has been understanding *when* and *how* these specific linguistic abilities actually emerge within the model. Traditional evaluation methods, like benchmarking, often only tell us about a model’s final performance, not the intricate journey of how it acquires concepts and capabilities.

To shed light on this crucial gap, new research introduces an innovative approach using ‘sparse crosscoders’ combined with a novel metric called ‘Relative Indirect Effects (RELIE)’. This method allows researchers to discover and align features across different stages (checkpoints) of a model’s pretraining, effectively tracking the evolution of linguistic features over time.

Unlocking the Model’s Learning Journey

The core idea involves training ‘crosscoders’ between various open-sourced model checkpoints. Think of crosscoders as advanced tools that can compare and map the internal ‘features’ or concepts learned by different versions of a language model. By doing this, they create a shared space where features can be directly compared, revealing which concepts are maintained, emerge, or even disappear during training.

The new metric, RELIE, then quantifies the causal importance of individual features for a specific task at different training stages. This helps pinpoint exactly when a feature becomes crucial for the model’s performance. The researchers validated their approach through detailed studies, confirming that RELIE accurately traces how and when features gain or lose relevance for a task.

From Simple Tokens to Complex Grammar

The study, which is architecture-agnostic and scalable, was applied to popular LLM families like Pythia, BLOOM, and OLMo. A key finding for monolingual models (like Pythia and OLMo) is a clear progression from detecting surface-level, token-specific patterns to internalizing deeper, high-level grammatical abstractions. For instance, early in Pythia’s training, features might simply detect specific word parts like “-ans” or “-ists.” As training progresses, these evolve into more abstract detectors for plural nouns, prepositions, or even complex grammatical structures like ‘deverbal nominalizations’ (nouns formed from verbs).

Even after a model’s overall performance on a task plateaus, the internal features continue to refine and specialize. The RELIE analysis visually demonstrates this, showing how features cluster based on their importance at different checkpoints, with new, more abstract features emerging even in later stages of training.

Cross-Lingual Understanding in Multilingual Models

For multilingual models like BLOOM, the research uncovered another fascinating trend: the consolidation of language-specific features into universal cross-lingual ones. Initially, BLOOM might have separate features for detecting main-verb heads in English versus French. Over time, these often merge into a single, shared feature that can identify main-verb heads across multiple languages.

However, this cross-lingual alignment isn’t uniform. Languages with similar morphological systems and shared scripts (like English, French, Spanish, and Portuguese) show higher feature overlap. In contrast, languages with greater morphological complexity or lower representation in the training data (such as Hindi and Arabic) tend to retain more language-specific representations, even at later training stages. This suggests that while LLMs can learn joint feature spaces, the degree of sharing can be influenced by linguistic differences and data availability.

Also Read:

A Path Towards More Interpretable AI

This research, conducted by Deniz Bayazit, Aaron Mueller, and Antoine Bosselut, offers a promising path toward a more interpretable and fine-grained analysis of how LLMs learn. By deploying crosscoders to understand joint-feature spaces between model checkpoints, they’ve provided unprecedented insights into the emergence, maintenance, and discontinuation of linguistic representations during pretraining. This work not only deepens our understanding of LLM capabilities but also opens doors for future research into the evolution of entire computational circuits within these complex models. You can read the full research paper here: Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Language Model Learning: Tracking Linguistic Feature Development During Pretraining

Unlocking the Model’s Learning Journey

From Simple Tokens to Complex Grammar

Cross-Lingual Understanding in Multilingual Models

A Path Towards More Interpretable AI

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates