TLDR: A2FM is a new AI model that unifies reasoning and tool-using capabilities of large language models. It introduces an ‘instant’ mode for simple queries, alongside agentic (tool-aware) and reasoning modes. Through Adaptive Policy Optimization, A2FM intelligently routes tasks to the most appropriate mode, achieving state-of-the-art accuracy across various benchmarks while significantly cutting computational costs by adapting its ‘thinking’ process to task complexity.
Large language models (LLMs) have shown incredible capabilities, but they often fall into two distinct categories: those excellent at deep, internal reasoning (like chain-of-thought models) and those skilled at interacting with external tools and environments (known as agentic models). This division means that a single LLM often struggles to be both deeply thoughtful and highly practical, leading to inefficiencies, especially on simple tasks where models might “overthink” or unnecessarily call tools.
Introducing A2FM: The Adaptive Agent Foundation Model
A new framework called A2FM, or Adaptive Agent Foundation Model, aims to bridge this gap. Developed by the OPPO AI Agent Team, A2FM unifies these different strengths by following a “route-then-align” principle. This means the model first learns to understand the nature of a task and then aligns its approach based on that understanding, all while operating under a shared core system.
To tackle the problem of inefficiency, A2FM introduces a clever third mode: the “instant” mode. This mode is designed to handle simple queries directly, preventing the model from engaging in unnecessary complex reasoning or tool interactions. This complements the existing agentic (tool-using) and reasoning (deep thinking) modes, creating a more balanced and efficient system.
How A2FM Learns to Adapt
A2FM’s ability to jointly enhance accuracy and efficiency comes from a novel training method called Adaptive Policy Optimization (APO). APO uses a cost-regularized reward system and adaptive sampling across its three modes. This allows the model to learn when to use which mode, favoring quick, instant solutions for easy questions and escalating to more complex reasoning or tool-use when a task truly demands it.
The model’s architecture includes a self-adaptive router that decides “what to do” for each query. For tasks requiring external information or code execution, it can activate the agentic mode, which uses tools like web search (via SerpAPI), web crawling (via Jina API and summarized by gpt-5-mini), and code execution (in an isolated environment using nsjail). For complex logical problems, it switches to the reasoning mode, generating detailed step-by-step thoughts. And for straightforward questions, the instant mode provides direct answers.
Also Read:
- Optimizing LLM Reasoning: A Hybrid Approach with Small and Large Models
- MatryoshkaThinking: Boosting LLM Reasoning with Recursive Self-Refinement
Impressive Performance and Cost Savings
Evaluated at the 32B scale, A2FM has achieved state-of-the-art results across a wide range of benchmarks. On agentic tasks, it scored 13.4% on BrowseComp, 70.4% on AIME25 for reasoning tasks, and 16.7% on HLE for general tasks. These scores not only set new records among comparable models but also show competitive performance against leading LLMs across agentic, reasoning, and general benchmarks.
One of A2FM’s most notable achievements is its significant cost efficiency. On the SuperGPQA benchmark, the adaptive execution achieved a “cost of pass” of only $0.00487 per correct answer. This represents a substantial reduction in cost—45.2% less than using only the reasoning mode and 33.5% less than using only the agentic mode—while maintaining comparable accuracy. This means A2FM delivers correct answers at roughly half the cost of traditional reasoning-based execution.
The model’s efficiency is further highlighted by its adaptive routing. For instance, on easy questions in SuperGPQA, A2FM used the instant mode for 61.1% of queries, but this dropped to just 5.3% for difficult ones, demonstrating its intelligent allocation of resources based on task complexity. The accuracy of instant responses remained stable at around 55% across all difficulty levels, proving its robustness.
In conclusion, A2FM represents a significant step forward in developing more versatile and efficient AI agents. By integrating instant, reasoning, and agentic modes under a single, adaptively routed backbone, it offers a scalable path towards LLMs that are both highly accurate and remarkably cost-effective. You can read the full research paper here.


