Faster AI Agents: A Framework for Parallel Execution

TLDR: AI agents are often slow due to sequential API calls. The “Speculative Actions” framework addresses this by using a fast “Speculator” model to predict future actions, enabling parallel execution while a slower “Actor” model validates them. This approach significantly reduces latency and speeds up agent performance across gaming, e-commerce, web search, and operating system tuning, without sacrificing accuracy. It introduces opportunistic parallelism as a key design principle for efficient agentic systems.

AI agents are becoming increasingly sophisticated, capable of performing complex tasks in diverse environments like web browsers, operating systems, and game engines. However, a significant challenge remains: their execution is often slow. This slowness stems from the inherently sequential nature of agent behavior, where each action typically requires an API call that can be time-consuming. Imagine a game of chess between two advanced AI agents taking hours, or an e-commerce agent pausing for too long between steps – such delays make these systems impractical for real-world interactive use or high-throughput automation.

The Bottleneck of Sequential Actions

The core problem is that agents interact with their environment one step at a time. Each observation leads to a decision, which triggers an API call (to an LLM, an external tool, or even a human), and the agent must wait for that call to complete before proceeding to the next step. This waiting time accumulates, creating a bottleneck that hinders training, evaluation, and deployment of advanced AI agents.

Introducing Speculative Actions

Inspired by techniques like speculative execution in microprocessors (where a CPU guesses future instructions to execute them in advance) and speculative decoding in large language model (LLM) inference, researchers have proposed a novel framework called “Speculative Actions.” This framework aims to break the strict sequential dependency of agent interactions, allowing for faster execution without compromising the final outcome.

How Speculative Actions Work

The Speculative Actions framework introduces two key roles within the agent’s environment loop:

Actor: This is the authoritative but slower executor. It could be a more capable LLM, an external API, or even a human. The Actor’s outputs represent the ground truth for correctness and side effects.
Speculator: This is an inexpensive, low-latency model designed to predict the most likely next environment step. Examples include smaller LLMs, simplified versions of the main LLM, or domain-specific heuristics. The Speculator guesses the next action, its arguments, and the expected observation.

The magic happens when the Speculator predicts future actions while the Actor is still deliberating on the current step. These predicted actions can then be tentatively executed in parallel. If the Actor later confirms a Speculator’s guess, time is saved because the subsequent steps have already been initiated. If the guess is incorrect, the speculative actions are safely discarded, and the system proceeds with the Actor’s validated decision, ensuring a “lossless” outcome – meaning the final result is identical to what a strictly sequential agent would achieve.

Ensuring Losslessness and Safety

A crucial design principle of Speculative Actions is that it should not degrade the final outcomes compared to a strictly sequential agent. This is achieved through several mechanisms:

Semantic Guards: Actors confirm that state transitions are equivalent before committing any speculative changes.
Safety Envelopes: Only idempotent (repeatable without changing the result), reversible, or sandboxed speculative side effects are allowed.
Repair Paths: If a guess is rejected, mechanisms like rollback or compensating actions are in place to correct the state.

These safeguards ensure that even if the Speculator makes a wrong prediction, the system can recover without any negative impact on correctness.

Real-World Applications and Results

The framework was evaluated across several diverse environments, each highlighting different latency bottlenecks:

Turn-based Gameplay (e.g., Chess): While the main agent (Actor) is deciding its move, the Speculator can predict the opponent’s likely responses and start analyzing counter-moves in parallel. This led to an average time saving of 19.5% with 54.7% prediction accuracy using three speculative predictions.
E-commerce Dialogue: In customer service scenarios, the Speculator can proactively infer a shopper’s intent (e.g., returning an item) and safely trigger tool calls in advance (e.g., checking return eligibility). This resulted in 22% to 38% of API calls being correctly predicted, allowing the agent to respond much faster, often within the user’s typing time.
Multi-hop Web Search (e.g., HotpotQA): When an agent needs to make multiple sequential API calls for information retrieval (like querying Wikipedia), the Speculator can guess likely content and execute subsequent search queries in parallel. This achieved up to 46% accuracy in predicting the next API call decisions.
Operating Systems (Lossy Extension): In a more flexible, “lossy” setting, the framework was applied to tune OS hyperparameters. A fast Speculator made immediate, reversible adjustments to improve real-time performance, while a slower Actor deliberated and confirmed/overwrote these changes. This significantly improved reaction time and accelerated convergence to optimal settings, preventing the system from lingering in degraded states.

These results demonstrate that speculative actions can achieve substantial accuracy in next-action prediction and translate into significant reductions in end-to-end latency, with up to 20% lossless speedup in some cases. The framework’s performance can be further enhanced through stronger guessing models, multi-step speculation, and uncertainty-aware optimization.

Also Read:

A New Path for Agentic Systems

Speculative Actions introduces a powerful systems-level design principle for modern agentic platforms: opportunistic parallelism in environment interactions. By treating every step – whether an LLM call, tool invocation, or human response – as an API call subject to prediction and parallelization, this framework transforms idle waiting time into productive computation. This opens a promising path toward deploying low-latency, real-time agentic systems in the real world. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Faster AI Agents: A Framework for Parallel Execution

The Bottleneck of Sequential Actions

Introducing Speculative Actions

How Speculative Actions Work

Ensuring Losslessness and Safety

Real-World Applications and Results

A New Path for Agentic Systems

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates