Routine: A New Framework for Stable AI Agents in Business Operations

TLDR: Routine is a novel framework that provides structured plans for AI agents, significantly improving their ability to perform multi-step tasks and use tools reliably in enterprise settings. It enhances execution accuracy for models like GPT-4o and Qwen3-14B and allows smaller models to achieve high performance through specialized training and knowledge distillation, making AI agent deployment more practical and stable in real-world business scenarios.

Deploying AI agent systems in a business environment often comes with significant hurdles. Common AI models frequently lack the specific knowledge needed for domain-specific processes, leading to disorganized plans, overlooking crucial tools, and unstable performance. This can make it difficult for companies to truly leverage the power of AI for automating complex tasks.

To tackle these challenges, a new framework called Routine has been introduced. Routine is designed as a multi-step agent planning framework that brings much-needed structure, clear instructions, and smooth parameter passing to guide an AI agent’s execution. This allows agents to perform multi-step tasks involving various tools with high stability and accuracy.

The impact of Routine has been quite impressive in real-world enterprise scenarios. For instance, the execution accuracy of GPT-4o, a powerful AI model, saw a dramatic increase from 41.1% to 96.3% when guided by Routine. Similarly, Qwen3-14B, another large language model, improved its performance from 32.6% to 83.3%. These results highlight Routine’s effectiveness in making AI agents more reliable for business operations.

Beyond just guiding execution, Routine also plays a crucial role in training AI models. Researchers created a training dataset that follows the Routine framework and used it to fine-tune Qwen3-14B. This resulted in an accuracy increase to 88.2% in scenario-specific evaluations, showing that models can better adhere to execution plans when trained with Routine’s structured guidance.

Furthermore, Routine-based data distillation was used to create a specialized dataset for multi-step tool-calling in specific business scenarios. Fine-tuning models on this distilled dataset boosted their accuracy significantly. For example, Qwen3-14B’s accuracy reached 95.5%, almost matching the performance of GPT-4o. This demonstrates Routine’s capability to help AI models learn domain-specific tool-usage patterns and adapt to new situations, making them highly effective for enterprise deployment.

The Routine framework is built around four core modules of an AI agent system: Planning, Execution, Tools, and Memory. The Planning Module, often assisted by domain experts, generates the structured Routine. The Execution Module, typically a smaller, specialized AI model, follows this Routine to make tool calls. The Tool Module, using a system like MCP servers, provides the actual tools and their definitions. Finally, the Memory Module manages both long-term ‘Procedure Memory’ (storing Routines) and short-term ‘Variable Memory’ (handling intermediate results and parameters) to ensure efficient and accurate task completion.

This approach significantly reduces the computational load and improves accuracy by providing clear, step-by-step instructions, rather than relying solely on the AI model’s autonomous reasoning for complex tasks. It also allows for the use of smaller, more efficient models for execution, making AI agent systems more practical and cost-effective for real-world enterprise deployment.

Also Read:

In essence, Routine offers a practical and accessible method for building stable AI agent workflows, accelerating the adoption of these systems in businesses, and advancing the vision of AI for process automation. For more in-depth information, you can refer to the original research paper: Routine: A Structural Planning Framework for LLM Agent System in Enterprise.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Routine: A New Framework for Stable AI Agents in Business Operations

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates