spot_img
HomeResearch & DevelopmentCodeAgents: Boosting LLM Agent Performance and Efficiency with Codified...

CodeAgents: Boosting LLM Agent Performance and Efficiency with Codified Reasoning

TLDR: CodeAgents is a new framework that improves Large Language Model (LLM) agents by codifying their interactions and reasoning into structured pseudocode. This approach significantly enhances planning capabilities, reduces token usage by 55-87%, and improves task accuracy by 3-36 percentage points across various benchmarks like GAIA, HotpotQA, and VirtualHome, making LLM-driven multi-agent systems more scalable and interpretable.

Large Language Models (LLMs) are becoming increasingly powerful in driving AI agents, helping them plan and execute complex tasks. However, current methods often face challenges like excessive verbosity, high token usage, and limitations in multi-agent scenarios. These issues can make LLM-driven agents less efficient and harder to manage.

To address these limitations, researchers have introduced CodeAgents, a novel framework designed to make multi-agent reasoning more structured and token-efficient. CodeAgents transforms the way LLM agents interact by codifying all aspects of their communication and planning into modular pseudocode. This includes tasks, plans, feedback, system roles, and even external tool invocations. By using pseudocode, which incorporates control structures like loops and conditionals, boolean logic, and typed variables, CodeAgents turns loosely connected agent plans into cohesive, interpretable, and verifiable reasoning programs.

How CodeAgents Works

The core idea behind CodeAgents is to treat a complex reasoning task like a program. Instead of relying on verbose natural language dialogues, the framework provides a pseudocode template that the LLM fills in and follows. This approach explicitly defines interactions between different agents, such as a Planner that outlines high-level plans, a Solver that executes detailed reasoning, and a Reviewer that provides feedback. Agents communicate clearly through well-defined variables, iterating as needed within a coherent prompt.

CodeAgents introduces several key innovations to enhance expressivity and efficiency. It uses typed variables for clear data distinctions, control flow structures for dynamic reasoning, and reusable subroutines for modularity. Crucially, the entire prompting approach is optimized for token-cost awareness, ensuring efficient use of LLM resources without sacrificing reasoning quality.

Single-Agent and Multi-Agent Architectures

The framework supports both single-agent and multi-agent configurations. In the single-agent setup, planning, execution, and feedback are integrated into one loop. For example, in a simulated environment like VirtualHome, an agent generates pseudocode plans, executes them step-by-step, and uses runtime feedback for iterative replanning. Assertion checks are embedded to catch and recover from local errors, escalating to global plan revisions when necessary.

The multi-agent framework, on the other hand, distributes these roles across specialized agents like Planner, ToolCaller, and Replanner. These agents collaborate through structured code-based exchanges. Each agent is initialized with a codified system prompt in YAML format, specifying its role and available tools. The Planner generates high-level plans as Python-style pseudocode, which can then be transformed into executable code for tool invocations or direct execution. If a tool execution fails, the Replanner agent is activated, consuming structured error traces to synthesize a revised sub-plan, enhancing system robustness.

Empirical Performance and Efficiency

CodeAgents was rigorously evaluated across three diverse benchmarks: GAIA, HotpotQA, and VirtualHome. The results consistently showed significant improvements in planning performance compared to natural language prompting baselines. For instance, on VirtualHome, CodeAgents achieved a new state-of-the-art success rate of 56%. In addition to accuracy gains, the approach drastically reduced input and output token usage, by 55–87% and 41–70% respectively, highlighting its superior token efficiency.

On multi-agent benchmarks like GAIA and HotpotQA, CodeAgents consistently matched or outperformed natural language methods in accuracy and F1 scores, while substantially cutting down token usage and cost. For example, on GAIA, the codified approach improved accuracy by 10.7% for Gemini-2.5-Flash, reducing input tokens by 67.8% and cost by 67.4%. These improvements are attributed to the high semantic density and reduced ambiguity of the codified format, requiring fewer tokens for reasoning cycles.

Also Read:

The Future of LLM Agents

The research paper concludes that this codified prompting framework significantly enhances LLM reasoning by representing agent interactions as typed pseudocode with modular control flows. This structure not only improves transparency and execution reliability but also boosts token efficiency. The findings suggest a promising path towards more interpretable and verifiable AI systems. For more detailed information, you can refer to the full research paper available at arXiv:2507.03254.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -