Understanding How Language Models Remember: The Influence of Time and Position

TLDR: A new study investigates how Large Language Models (LLMs) retrieve information based on its temporal position, rather than just its meaning. Through experiments with repeated tokens and overlapping ‘episodes’, researchers found that both transformer and state-space models exhibit strong temporal biases, favoring information at the beginning or end of a prompt (primacy and recency effects). An ablation study in transformers linked these biases to ‘induction heads’, crucial components for sequential recall. The findings suggest that temporal biases are fundamental to LLM processing, impacting how they learn and retrieve context, and offer insights into the ‘lost in the middle’ phenomenon.

Large Language Models (LLMs) have shown an incredible ability to learn from the information provided directly within their input, a process known as in-context learning. While much attention has been paid to how these models understand meaning, a new study delves into a less explored but equally crucial aspect: how the timing and position of information within a prompt influence what an LLM remembers and retrieves.

This research, titled Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models, draws a parallel between LLMs and human episodic memory. Just as humans recall events based on when they happened, the study investigates whether LLMs can differentiate and retrieve information based on its temporal separation. The authors, Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, and Zoran Tiganj from Indiana University Bloomington, designed experiments to isolate these temporal effects, removing semantic distractions to get a clearer picture.

Unpacking Temporal Positional Biases

The first experiment aimed to understand the inherent temporal biases in LLM retrieval, independent of any meaning. Researchers created prompts where a specific token (like ‘A’) was repeated multiple times, separated by sequences of random, unique tokens. A final instance of the fixed token acted as a probe, and the models were tasked with predicting the next token. By shuffling the random tokens, the team ensured that any observed patterns were due to temporal position, not semantic content.

The findings were striking: all seven tested models, including both transformer-based (like Llama, Mistral, Qwen, Gemma) and state-space models (like Mamba, Falcon-Mamba, Recurrent-Gemma), consistently showed a preference for predicting the token that immediately followed a repeated token. This indicates a tendency for ‘serial recall’ – remembering sequences in the order they were presented. More importantly, the strength of this recall varied significantly with the token’s position in the prompt. Models often showed a bias for information presented at the very beginning (primacy effect) or the very end (recency effect) of the input, a phenomenon often referred to as being ‘lost in the middle’. Different models exhibited distinct biases; for instance, Mistral leaned towards recency, while Falcon-Mamba showed a primacy bias.

Testing Episodic Retrieval with Interference

The second experiment pushed the models further, evaluating their ability to retrieve specific temporal sequences, or ‘episodes’, when presented alongside other similar, partially overlapping sequences. Prompts contained five distinct episodes, each with a unique context token, followed by the same fixed token, and then a unique target token (e.g., ‘BAH’, ‘CAF’, ‘XAM’). The models were then probed with a context and fixed token pair (e.g., ‘XA’) and had to predict the correct target token (‘M’).

Most models successfully retrieved the correct target token, demonstrating a capacity for temporal separation. However, this retrieval wasn’t perfect. Smaller peaks corresponding to non-probed episodes were often visible, indicating interference from competing memories. Retrieval was generally strongest for episodes located nearer the end of the prompt, reinforcing the recency bias observed in the first experiment. Mamba and Falcon-Mamba models, in particular, showed less robust retrieval, especially for episodes closer to the end.

The Role of Induction Heads in Transformers

To understand the underlying mechanisms in transformer models, an ablation study was conducted. Researchers focused on ‘induction heads’, specific components within transformer architectures known to be crucial for in-context learning and temporal processing. These heads essentially find previous occurrences of a token and attend to the token that followed it, learning and reproducing sequences based on temporal association.

By progressively disabling these top induction heads, the study found a significant degradation in the models’ ability to perform serial recall and selectively retrieve the correct episode amidst interference. Ablating randomly selected heads had a much weaker impact, confirming the critical role of induction heads in these temporal processing behaviors. This suggests that these heads are key to how transformers manage and separate temporal context.

Also Read:

Broader Implications

This research deepens our understanding of how LLMs process and retrieve information based on its temporal structure. The consistent temporal biases, including primacy and recency effects, suggest that these are fundamental properties of sequential processing in LLMs, not just artifacts of semantic content. Interestingly, state-space models, despite their different architecture, exhibited comparable temporal biases, hinting that these limitations might arise from more fundamental aspects of how context history is maintained and accessed over time.

For the development of future LLMs, these findings highlight that addressing the ‘lost in the middle’ problem requires tackling these fundamental temporal processing limitations. Simple architectural changes alone might not be sufficient. From a cognitive science perspective, this methodology offers a controlled way to compare how different computational architectures handle temporal context and interference, providing insights into memory-like phenomena in artificial intelligence.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Understanding How Language Models Remember: The Influence of Time and Position

Unpacking Temporal Positional Biases

Testing Episodic Retrieval with Interference

The Role of Induction Heads in Transformers

Broader Implications

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates