Investigating Function Composition in Large Language Models

TLDR: Large language models (LLMs) solve complex, multi-step tasks using two distinct internal mechanisms: a ‘compositional’ approach that explicitly computes intermediate steps, and a ‘direct’ approach that bypasses them. The choice between these methods is strongly influenced by how linearly the task’s input and output are represented in the model’s internal embedding space. This research confirms the ‘compositionality gap’ in modern LLMs and provides insights into their internal processing strategies.

Large language models (LLMs) are becoming increasingly adept at handling complex tasks that require combining multiple steps, like answering a question that needs two pieces of information to be linked. However, a fundamental question remains: do these models truly break down and solve these tasks step-by-step, or do they find a more direct, less interpretable way to get to the answer?

A recent study by Apoorv Khandelwal and Ellie Pavlick from Brown University delves into this very question, investigating how LLMs tackle what are known as “two-hop factual recall tasks” – problems that can be formally expressed as a composition of two functions, g(f(x)).

The Enduring Compositionality Gap

The researchers first confirmed a persistent challenge in LLMs, often called the “compositionality gap.” This refers to the observation that even if an LLM can correctly compute the first step (z=f(x)) and the second step (y=g(z)) independently, it doesn’t automatically mean it can correctly compute the full composition (y=g(f(x))). This gap was found to hold true for modern LLMs across a wider range of tasks, and while larger models and those with advanced reasoning capabilities showed some improvement, the gap didn’t entirely disappear.

Two Ways LLMs Solve Problems

Using advanced interpretability techniques, specifically the “logit lens” on the models’ internal activations, the study identified two distinct ways LLMs process these compositional tasks:

1. Compositional Processing: In this mechanism, the model genuinely computes the intermediate variable, f(x), as a detectable step on its way to calculating g(f(x)). This is akin to a human solving a multi-step math problem by writing down the result of each step.

2. Direct Processing (Idiomatic): Here, the model appears to map the initial input, x, directly to the final output, g(f(x)), without any clear, detectable representation of the intermediate variable, f(x). This is more like instantly knowing the answer to a complex problem without consciously going through all the steps.

Interestingly, the presence of these intermediate steps was much clearer when the model produced a correct answer, though the study notes that this doesn’t necessarily imply a direct causal link without further investigation.

What Influences the Choice of Mechanism?

The research uncovered a fascinating insight into why an LLM might choose one mechanism over the other: the geometry of its internal embedding space. The study found a strong inverse correlation between the “linearity” of the representation and the “compositionality” of the processing. In simpler terms, if the relationship between the initial input (x) and the final output (g(f(x))) can be easily represented as a straightforward, linear transformation within the model’s internal embedding space, the LLM is more likely to use the direct, idiomatic computation. Conversely, if this relationship is less linear, the model tends to resort to the more explicit, compositional processing.

This suggests that the model’s pre-training plays a significant role. Tasks or relationships that are well-represented and easily mapped in the embedding space might be solved directly, almost like a learned shortcut. For less familiar or less linearly represented relationships, the model might fall back on a more step-by-step, compositional approach.

Also Read:

Broader Implications for AI Understanding

These findings offer a novel perspective on the long-standing debate about whether compositional behavior in AI necessarily requires compositional internal mechanisms. The study suggests that LLMs employ a flexible mix of both. This “bottom-up” approach to understanding LLM cognition could lead to new hypotheses about how AI, and potentially even human cognition, handles tasks that sometimes require systematic, logical steps and other times rely on more intuitive, heuristic shortcuts.

While this research focuses on specific types of tasks and models, it opens doors for future work to explore how these mechanisms relate to an LLM’s ability to generalize to new, unseen compositional problems. For a deeper dive into their methodology and findings, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Investigating Function Composition in Large Language Models

The Enduring Compositionality Gap

Two Ways LLMs Solve Problems

What Influences the Choice of Mechanism?

Broader Implications for AI Understanding

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

A New Way to Disentangle Data for Scientific Exploration

SiegPath Honored with ‘Most Innovative Fintech Award’ at AI Expo Europe 2025 for AI-Driven Solutions

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates