TLDR: I2I-STRADA is a novel AI agent architecture for data analysis that formalizes the reasoning process. Unlike general-purpose LLMs, it uses a structured, modular workflow with distinct sub-tasks for analytical thinking, including goal construction, contextual grounding, and a two-stage adaptive planning and execution. It dynamically creates tools and handles execution state, leading to superior performance on benchmarks like DABstep and DABench by improving planning coherence and insight alignment in complex, real-world data scenarios.
In today’s fast-paced enterprise environments, dealing with vast amounts of diverse and often messy data for real-time analysis is a significant challenge. Traditional methods struggle with data in multiple formats, missing information, and evolving business needs. While advanced AI models, particularly large language models (LLMs), have shown promise in understanding unstructured data and adapting to changing information, they often fall short in providing a consistent, structured approach to analytical thinking.
This is where a new agentic architecture called I2I-STRADA, which stands for Information-to-Insight via Structured Reasoning Agent for Data Analysis, steps in. Developed by Sai Barath Sundar, Pranav Satheesan, and Udayaadithya Avadhanam from Mphasis Limited, I2I-STRADA aims to formalize the complex reasoning process involved in data analysis. Instead of treating reasoning as a ‘black box,’ it models how analysis unfolds through a series of modular sub-tasks that mirror the cognitive steps of human analytical reasoning.
How I2I-STRADA Works: A Structured Approach
The core of I2I-STRADA lies in its structured and modular design, built on two key principles: progressive abstraction, which means filtering out noise while keeping crucial information at each stage, and multi-step refinement, using a two-stage planning process to continuously improve reasoning quality.
The workflow begins with Goal Construction. Here, the agent interprets the user’s query to understand the main intent, identify key data points, outline a preliminary strategy, and note any specific conditions. This initial understanding is crucial for guiding subsequent steps.
Next, the Contextual Reasoner acts as a bridge, refining the initial goal by incorporating contextual information. This includes referencing metadata about data systems and standard operating procedures (SOPs) to ensure the plan aligns with available data structures and specific domain rules.
The system then moves into a two-stage planning process. First, Workflow Scaffolding generates a high-level, global plan before the agent even interacts with the actual data. This foundational ‘scaffold’ guides the entire analysis. Following this, the Adaptive Planning and Executor takes over. This is an iterative module that generates detailed, execution-level plans. Crucially, it dynamically adjusts subsequent steps based on the results of prior actions, including actual data exploration and intermediate outcomes. This adaptability is vital for complex tasks, as real-world data interaction often informs the best path forward. The execution involves writing and running Python code snippets in a secure environment.
Supporting these core reasoning steps are other vital components: a Context-Aware Tool Creation module that dynamically builds data processing tools and scripts on the fly, essential for handling diverse data sources; a Dynamic State Handler that acts as the agent’s working memory, maintaining execution context and enabling debugging; and a Communication Handler that ensures the final results are presented clearly, address user goals, and conform to required formats.
Also Read:
- mKGQAgent: Advancing Multilingual Question Answering for Knowledge Graphs
- INRAExplorer: Advancing Scientific Knowledge Exploration with Agentic RAG and Knowledge Graphs
Performance and Impact
I2I-STRADA’s effectiveness and generalizability have been rigorously tested on two prominent benchmark datasets: DABstep and DABench. The DABstep dataset, which focuses on financial and operational data with procedural constraints, saw I2I-STRADA outperform several state-of-the-art data science agents. It achieved an impressive 80.56% accuracy on easy tasks and 28.04% on hard tasks, demonstrating superior planning and error handling, especially when adhering to specific rules.
On the DABench benchmark, which covers a wide array of end-to-end data science tasks across various domains like marketing, finance, and energy, I2I-STRADA also showed strong performance with 90.27% accuracy. This highlights its robustness across different types of data analysis tasks, whether domain-specific or purely statistical.
While the system shows remarkable strengths, the authors note areas for improvement, such as inconsistent handling of “Null” values in certain scenarios and the impact of hyperparameter choices in machine learning algorithms. Nevertheless, I2I-STRADA significantly advances the field by addressing the limitations of general LLMs in complex analytical scenarios, offering a more reliable and interpretable approach to data analysis.
This innovative architecture promises to further the development of sophisticated AI agents capable of comprehensive data analysis in real-world settings. For more details, you can read the full research paper here.


