TLDR: This research paper surveys the paradigm shift in Agentic AI, moving from ‘pipeline-based’ systems, where planning, tool use, and memory are externally orchestrated, to ‘model-native’ agents, where these capabilities are internalized within the model’s parameters. It highlights Reinforcement Learning (RL) as the key algorithmic engine enabling this transition, allowing models to learn through outcome-driven exploration. The paper details how core capabilities like planning, tool use, and memory have evolved to become intrinsic, and examines the impact on applications such as Deep Research agents and GUI agents, concluding with future directions for autonomous, self-evolving AI.
The world of Artificial Intelligence is constantly evolving, and a new phase is upon us: Agentic AI. This exciting development moves beyond traditional AI systems that merely respond to commands, ushering in an era where Large Language Models (LLMs) can actively act, reason, and adapt to their environments. A recent survey, titled “Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI”, delves into this transformative shift, highlighting how AI agents are moving from being externally controlled to becoming intrinsically intelligent.
Traditionally, building AI agents involved a “pipeline-based” approach. Imagine an assembly line where different parts of the agent’s intelligence – like planning, using tools, and remembering information – were handled by separate, external components or pre-written instructions. For instance, early planning systems might have used external symbolic planners or relied on specific prompts like “Chain-of-Thought” to guide the model step-by-step. Tool use often meant simple, single-turn calls to external APIs, and memory was managed by summarizing conversations or retrieving information from external databases. While these systems offered modularity, they were often rigid, struggled to adapt to unexpected situations, and treated the LLM as a passive tool rather than an active decision-maker.
However, the field is now experiencing a significant “model-native” paradigm shift. This means that the core capabilities of an agent are no longer external scripts but are being internalized directly within the model’s parameters. Planning, tool use, and memory management are becoming inherent behaviors of the AI model itself, learned through extensive training. This shift transforms the LLM from a reactive tool into an autonomous decision-maker that learns to generate plans, invoke tools, and manage its own memory.
A key driver behind this paradigm shift is Reinforcement Learning (RL). Unlike traditional supervised learning, which teaches models by having them imitate static examples, RL allows models to learn through outcome-driven exploration. The model interacts with an environment, takes actions, and receives feedback (rewards) based on the success of those actions. This process enables the model to discover novel and more effective strategies that might not exist in human-curated data, essentially turning it into an active explorer rather than a passive imitator. This approach, often summarized as “LLM + RL + Task,” is becoming a unified solution across various AI domains.
Internalizing Core Agent Capabilities
Let’s look at how the three core agentic capabilities – planning, tool use, and memory – are evolving:
Planning: This has moved from relying on external symbolic planners or prompt engineering techniques like Chain-of-Thought (CoT) to being internalized. Through supervised learning (using synthesized or distilled reasoning data) and especially reinforcement learning, models are learning to “think” and plan autonomously. This means the model doesn’t just follow a script; it learns the underlying logic of planning, making it more flexible and robust.
Tool Use: Similarly, tool use has transitioned from external system workflows or prompt-based methods (like ReAct, which interleaves thought and action) to being integrated within the model. This involves modular training, where a planner selects actions and a separate executor handles tool calls, or even end-to-end training, where a single model learns both the strategic planning of when to use tools and the precise execution of those calls. This allows agents to decide autonomously when and how to invoke diverse tools as part of their internal policy.
Memory: Memory management is also shifting from external modules (like conversation summarization or Retrieval-Augmented Generation, RAG) to model-native mechanisms. For short-term memory, advancements in position encoding, long-sequence data synthesis, and attention optimization allow models to directly process and utilize vast amounts of information within a single session. For long-term memory, while storage might still be external, the model is learning to internalize the strategies for retrieving and effectively using that information, making memory an active, policy-driven behavior.
Real-World Applications
This paradigm shift is profoundly reshaping how AI agents are applied in real-world scenarios:
Deep Research Agents: These agents act as a “brain,” excelling at complex reasoning and analysis. Early versions, like AI search engines, relied on engineered pipelines for query expansion and answer generation. Now, model-native Deep Research agents, such as those from OpenAI and Tongyi Lab, are fine-tuned to strategize the entire research process, leading to more consistent, in-depth information discovery. They can tackle rigorous professional tasks, though challenges remain in handling information noise and defining subjective rewards for open-ended research.
GUI Agents: These agents act as “eyes and hands,” simulating human interaction with graphical environments. Initially, they used pipeline-based approaches like record-and-replay or prompt-based orchestration. The model-native paradigm has led to solutions that internalize perception, planning, and action execution into a unified policy. This allows GUI agents to perform complex tasks like automated software testing or workflow automation with high precision and adaptability, moving beyond brittle external scripts. However, they face challenges with fine-grained visual cues and dynamic interface states.
Also Read:
- Adaptive Search: How Reinforcement Learning Powers Intelligent AI Agents
- Foundation Models: Charting a New Course for Scientific Exploration
The Road Ahead
The journey towards model-native agentic AI is ongoing. Future research will likely focus on internalizing even more capabilities, such as multi-agent collaboration (where agents learn to work together autonomously) and reflection (where agents learn to self-assess and correct their own errors). The role of system engineering is also evolving; instead of compensating for model limitations, it will focus on providing foundational infrastructure for a robust and scalable agent ecosystem, including identity management, resource allocation, and standardized protocols.
This evolution signifies a profound change: from building systems that merely apply intelligence to developing models that actively grow intelligence through continuous experience and interaction. To learn more about this fascinating area, you can read the full research paper here: Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI.


