spot_img
HomeResearch & DevelopmentAgent Foundation Models: A Unified Approach to AI Problem...

Agent Foundation Models: A Unified Approach to AI Problem Solving

TLDR: A new AI paradigm called Chain-of-Agents (CoA) allows large language models to perform complex multi-agent, multi-tool problem-solving within a single model. This approach, used to create Agent Foundation Models (AFMs), is trained through multi-agent distillation and reinforcement learning, leading to state-of-the-art performance in web and code tasks, while significantly improving computational efficiency and generalization compared to traditional multi-agent systems.

The world of artificial intelligence is constantly evolving, with recent advancements in large language models (LLMs) and multi-agent systems showcasing impressive capabilities in tackling complex challenges, from in-depth research to intricate coding and mathematical reasoning. However, many existing multi-agent systems face significant hurdles: they often rely on manual prompt engineering, leading to computational inefficiencies, limited adaptability, and an inability to benefit from continuous data-driven learning.

Addressing these limitations, the OPPO AI Agent Team has introduced a groundbreaking new paradigm called Chain-of-Agents (CoA). This innovative approach enables LLMs to perform complex, multi-turn problem-solving, much like a traditional multi-agent system, but entirely within a single model. Imagine a single AI brain that can dynamically activate different specialized ‘tool agents’ and ‘role-playing agents’ to simulate a collaborative team, all working together seamlessly and end-to-end.

In the CoA framework, the model intelligently orchestrates various agents. These include ‘Role-playing Agents’ such as a Thinking Agent to manage the reasoning flow, a Plan Agent to break down tasks, a Reflection Agent for self-critique, and a Verification Agent to ensure correctness. Alongside these are ‘Tool Agents’ like a Search Agent for optimized queries, a Crawl Agent for content extraction, and a Code Generate Agent for code execution in a sandbox environment. This dynamic coordination within one model eliminates the need for complex prompt and workflow engineering, significantly reducing the computational overhead typically associated with inter-agent communication in conventional multi-agent systems.

To instill these end-to-end Chain-of-Agents problem-solving abilities into LLMs, the researchers developed a multi-agent distillation framework. This process involves distilling the capabilities of state-of-the-art multi-agent systems into CoA-compatible trajectories, which are then used for ‘agentic supervised fine-tuning’. Following this, ‘agentic reinforcement learning’ is applied to further refine the models’ performance on verifiable agentic tasks. The resulting models are termed Agent Foundation Models (AFMs).

Empirical studies have demonstrated that AFMs achieve new state-of-the-art performance across a wide array of benchmarks in both web agent and code agent settings. For instance, AFMs have shown superior success rates on challenging web agent benchmarks like GAIA, BrowseComp, and HLE, and impressive results in code generation and mathematical reasoning on LiveCodeBench and AIME2025. Beyond performance, AFMs also boast remarkable computational efficiency, reducing inference costs (in terms of token consumption) by a substantial 84.6% compared to traditional multi-agent systems, while maintaining competitive performance.

Furthermore, the research highlights AFM’s strong generalization capabilities, particularly its ability to handle unseen agents. For example, a code agent model trained only on code and math tasks could successfully orchestrate unseen web search and visual inspector tools when their descriptions were provided. This indicates a robust understanding of tool invocation formats and dynamic adaptation.

Also Read:

The OPPO AI Agent Team has made their entire research, including model weights, training and evaluation code, and training data, fully open-sourced. This significant contribution provides a solid foundation for future research and development in agent models and agentic reinforcement learning. For more detailed information, you can refer to the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -