Enhancing LLM Agents for Extended Multi-Turn Tasks Through Adaptive Summarization

TLDR: This research introduces SUPO, a new reinforcement learning framework that enables large language models (LLMs) to handle complex, multi-turn tasks beyond their fixed context limits. It achieves this by teaching LLMs to generate intelligent summaries of past interactions, keeping the context compact and relevant. Experiments show SUPO significantly improves task success rates on function calling and searching tasks, demonstrating a scalable approach for long-horizon agent training.

Large Language Models (LLMs) have shown incredible potential as problem-solvers, capable of understanding natural language, generating structured outputs, and interacting with external tools. However, when these powerful AI agents are tasked with complex, multi-turn problems that require many steps or interactions, they often hit a fundamental roadblock: their limited context window.

Imagine an LLM agent trying to solve a long-running puzzle. As it makes more moves and gathers more information, the “history” of its actions and observations grows. This ever-expanding history quickly fills up its working memory, leading to several challenges. Firstly, the LLM’s ability to follow instructions and reason effectively can degrade when dealing with very long contexts. Secondly, processing these extensive histories becomes computationally expensive, slowing down the learning process. Most critically, the fixed size of an LLM’s context window fundamentally limits how far into a task it can go, preventing it from tackling problems that require more interactions than can fit into its memory.

To overcome this scalability barrier, researchers have introduced a novel approach called summarization-based context management. This method allows LLM agents to scale their operations beyond a fixed context length by periodically compressing their past interactions into concise, LLM-generated summaries. Instead of letting the context grow indefinitely, the agent’s working memory is regularly refreshed with a compact, yet informative, summary of what has happened so far. Crucially, these summaries are not pre-defined or based on rigid rules; instead, the LLM agent learns how to generate them as part of its training, optimizing what information to keep, how to abstract it, and what details to discard as irrelevant.

This innovative idea is formalized through a “summarization-augmented Markov Decision Process” (MDP), which integrates summarization steps directly into the agent’s decision-making process. This framework allows for a policy gradient representation, meaning that existing reinforcement learning (RL) systems can be seamlessly adapted to train these agents. The result is an end-to-end optimization process that improves both the agent’s ability to use tools and its strategy for summarizing information.

The researchers instantiated this framework with an algorithm named SUmmarization augmented Policy Optimization (SUPO). SUPO is designed to jointly optimize both the agent’s tool-use behaviors and its summarization strategies. Key design elements of SUPO include a smart way of managing trajectories (the sequence of actions and observations), a method for estimating advantages that helps stabilize learning, and a mechanism to mask “overlong” trajectories, which prevents the model from being penalized for attempting longer, more complex tasks.

Experiments were conducted on two challenging multi-turn tool-use tasks: CodeGym, a synthetic environment for interactive function calling, and BrowseComp-Plus, a complex searching task. The results were compelling. SUPO significantly improved the success rates on both tasks, achieving gains of +3.2% on CodeGym and +14.0% on BrowseComp-Plus, all while maintaining the same or even lower working context lengths compared to traditional baselines. Furthermore, SUPO demonstrated an impressive ability to scale performance even when the number of summarization rounds during testing exceeded those during training, suggesting a robust and generalizable learning of summarization strategies.

Also Read:

This work establishes summarization-based context management as a principled and scalable approach for training RL agents to operate effectively beyond the limitations of a fixed context window. It opens doors for LLM agents to tackle even more complex and long-horizon tasks in the future, potentially leading to more reliable, intelligent, and autonomous AI systems. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing LLM Agents for Extended Multi-Turn Tasks Through Adaptive Summarization

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates