COCORELI: A Hybrid AI Framework for Precise Language Instruction Execution

TLDR: COCORELI is a novel hybrid AI framework that uses medium-sized LLM agents, abstraction mechanisms, and a discourse module to accurately follow complex language instructions, minimize hallucinations, and perform spatial reasoning. It significantly outperforms larger LLM-based systems in collaborative construction tasks and demonstrates strong generalization abilities in API completion, effectively identifying missing information and learning abstract functions from context.

Large Language Models (LLMs) have made incredible strides, but they often hit roadblocks when faced with real-world tasks that demand precise instruction following, spatial reasoning, and a complete absence of made-up information, known as hallucination. These challenges are particularly evident in complex scenarios where LLMs need to plan, use multiple tools, or learn from limited examples.

Introducing COCORELI: A Smarter Approach to Language Instructions

A new framework called COCORELI, which stands for Cooperative, Compositional Reconstitution & Execution of Language Instructions, offers a promising solution. Developed by researchers including Swarnadeep Bhar, Omar Naim, Eleni Metheniti, Bastien Navarri, Loïc Cabannes, Morteza Ezzabady, and Nicholas Asher, COCORELI is a hybrid agent system designed to overcome these limitations. What’s particularly impressive is that it achieves this using medium-sized LLMs, outperforming systems that rely on much larger models.

How COCORELI Works

COCORELI’s strength lies in its modular design, integrating several specialized LLM agents with innovative abstraction mechanisms and a ‘discourse module’. This allows it to dynamically learn high-level representations of an environment directly from user instructions. The system’s architecture includes:

Discourse Module: This is a crucial component that generates clarification questions when an agent needs more information to execute a task. This proactive questioning significantly reduces the chance of the system hallucinating missing details.
Instruction Parser: It interacts with the user to extract key information about objects and their desired locations from natural language instructions.
Locator: This agent takes information from the parser to determine precise coordinates in the 3D environment. If details are incomplete, it triggers the discourse module for clarification.
Builder: Checks an external memory for known structures or uses instructions to construct new ones. It also uses the discourse module if instructions are unclear.
External Memory: Stores predefined functions and previously created shapes as relational graphs, enabling COCORELI to recall and adapt complex structures.
Executor: Combines information from the Builder and Locator to produce a JSON object, which can then be run as a deterministic program to build or modify structures.

One of COCORELI’s standout features is its ability to learn new complex object functions through abstraction. This means it can take a specific instruction, like building a ‘tower made of three red nuts’, abstract its parameters (color, parts, location), and then recreate a similar structure with different specifications later on. This function-based approach is highly efficient, using one function for a complex structure rather than many individual placement instructions.

Testing COCORELI in a Challenging 3D World

To evaluate its capabilities, COCORELI was tested on an ‘ENVIRONMENT’ task, a collaborative construction challenge in a 3D grid. This environment is more complex than typical benchmarks like Minecraft, featuring a larger grid, more diverse object types (like nuts, washers, bridges that occupy multiple spaces), and strict physics rules such as gravity. The tasks ranged from placing single parts and sequences of parts to constructing complex shapes, handling underspecified instructions, and learning abstract functions from context.

COCORELI was compared against two baseline systems: a single LLM using a Chain-of-Thought (CoT) approach and an agentic LLM system, both utilizing larger LLMs (Claude 3.5 Sonnet, GPT-4.1, and LLaMA 3-70b, respectively) compared to COCORELI’s LLaMA-3.1 8b.

Also Read:

Impressive Results and Versatility

The results were compelling. COCORELI consistently outperformed the baselines across various tasks:

It excelled at identifying part types, colors, and coordinates, especially in sequences of instructions where CoT LLMs struggled with the second object.
For constructing complex shapes, COCORELI demonstrated a higher overall accuracy in following instructions and was the only system capable of partially parsing instructions for a very complex ‘Moroccan bridge’ structure that stumped other models.
Its clarification loop proved highly effective in handling underspecified instructions, correctly detecting missing information, asking for it, and then accurately parsing the complete instruction without hallucinating. This was a significant weakness for the CoT and even the agentic LLM baselines in more complex underspecified scenarios.
COCORELI was the only system capable of learning and reproducing all novel shapes from abstract instructions, showcasing its superior abstraction and generalization abilities.

Beyond the ENVIRONMENT tasks, COCORELI also demonstrated its versatility by successfully applying its in-context function learning to the ToolBench API completion task, where it achieved 100% precision and recall in function reuse, unlike the CoT baseline. This highlights its robustness and transferability to different domains.

In conclusion, COCORELI represents a significant step forward in developing more reliable and capable AI agents. By combining medium-sized LLMs with a sophisticated modular architecture, including a discourse module for clarifications and powerful abstraction capabilities, it effectively addresses key limitations of current LLMs in complex, real-world tasks. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

COCORELI: A Hybrid AI Framework for Precise Language Instruction Execution

Introducing COCORELI: A Smarter Approach to Language Instructions

How COCORELI Works

Testing COCORELI in a Challenging 3D World

Impressive Results and Versatility

Gen AI News and Updates

Comprehensive Evaluation: A New Framework for Assessing AI Agent Performance

HyMoERec: A Dynamic Approach to Personalized Recommendations

Teneo.ai Unveils Teneo 8 Healthcare AI Agents for Secure PII Automation and Regulatory Compliance

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates