MetaboT: A Multi-Agent AI System for Metabolomics Data Analysis

TLDR: MetaboT is a new multi-agent framework that uses Large Language Models (LLMs) to enable interactive analysis of mass spectrometry metabolomics knowledge graphs. It translates natural-language questions into precise SPARQL queries, resolves entities accurately, and significantly reduces LLM hallucinations. The system’s modular design, specialized agents, and iterative refinement process allow researchers to explore complex biological data and discover novel compounds without requiring programming expertise. Evaluation shows a substantial increase in query accuracy when using MetaboT’s multi-agent architecture compared to standalone LLMs.

A new framework called MetaboT has been developed to simplify the complex analysis of mass spectrometry metabolomics data. This innovative system uses a multi-agent approach powered by Large Language Models (LLMs) to help researchers navigate vast amounts of biological information without needing specialized programming skills.

Mass spectrometry metabolomics generates incredibly detailed data, but traditional methods often struggle to process it efficiently. This creates a bottleneck in discovering new molecular interactions, metabolic pathways, and potential drug candidates. While knowledge graphs have emerged as a powerful tool for integrating metabolites and biological entities, their complexity and reliance on specialized query languages like SPARQL have limited their widespread use. Existing LLM-based solutions for SPARQL generation often suffer from issues like ‘hallucinations’ (generating incorrect information) and a heavy dependence on detailed schema information.

MetaboT addresses these challenges by breaking down complex metabolomics knowledge graph queries into smaller, manageable subtasks. Each subtask is then handled by a specialized agent, leading to more precise entity resolution, accurate SPARQL query generation, and significantly fewer hallucinations compared to single-LLM systems. This allows researchers to interact with a metabolomics knowledge graph through an intuitive conversational interface.

The framework’s architecture is designed for flexibility, managing entity resolution, query processing, and iterative refinement. When a user asks a natural-language question, an Entry Agent first determines if it’s a new query or a follow-up. New questions go to a Validator Agent, which checks if the question is relevant to the knowledge graph’s schema. For specific questions, like those about plants, a PlantDatabaseChecker tool confirms the plant’s presence in a curated database.

The Supervisor Agent then coordinates tasks, delegating entity resolution to the KG (knowledge graph) Agent. The KG Agent uses specialized tools like ChemicalResolver, SMILESResolver, TargetResolver, and TaxonResolver to convert entities mentioned in the user’s question into standardized identifiers. For example, it can retrieve a Wikidata identifier for a plant species or a ChEMBL identifier for a biological target. These tools leverage external APIs or retrieval-augmented generation (RAG) on local documents, ensuring accuracy and reducing hallucinations.

Once entities are resolved and schema details are gathered, the SPARQL Query Runner Agent constructs a detailed prompt for the GraphSparqlQAChain tool. This tool uses an LLM to generate a SPARQL query, executes it on the knowledge graph, and retrieves the answer. A key feature is MetaboT’s ability to distinguish between query construction errors and genuine data absence, implementing an iterative refinement loop to automatically reformulate queries if initial attempts fail.

If further interpretation of results is needed, the Supervisor Agent calls the Interpreter Agent. This agent can summarize outputs or provide visualizations like bar charts or diagrams, making complex data more understandable. MetaboT has been validated on a large plant dataset using 50 representative queries. Evaluations showed that while GPT-4o alone had an accuracy of 8.16% for complex queries, its integration into the multi-agent framework boosted accuracy to 83.67%. This significant improvement highlights the power of the multi-agent architecture in overcoming the limitations of standalone LLMs for metabolomics knowledge graph exploration.

MetaboT is accessible via a user-friendly web application, designed to prioritize ease of use for researchers. It supports features like file uploads, visualizations, and integration with Metabolomics USIs for spectrum plots. The system’s modular design ensures scalability and adaptability, allowing for straightforward extension to other mass spectrometry-based knowledge graphs and compatibility with different LLM models. While the current demonstration focuses on the Experimental Natural Products Knowledge Graph (ENPKG), its design supports broader application.

Also Read:

Despite its strengths, MetaboT has some limitations, including its reliance on high-performance LLMs for optimal accuracy, occasional LLM hallucinations, and the non-deterministic nature of LLM outputs. Currently, it is also restricted to single-graph querying, meaning researchers must manually connect data with external resources. Future developments aim to address these limitations by focusing on automated benchmarking, expanded interoperability with additional knowledge graphs, and integrating emerging AI capabilities for cheminformatics and biomedical hypothesis generation. The goal is to evolve MetaboT into a comprehensive toolbox for mass spectrometry data analysis, bridging the gap between specialized query writing and intuitive conversational interactions, thereby accelerating metabolomics discoveries. You can learn more about this research at the research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MetaboT: A Multi-Agent AI System for Metabolomics Data Analysis

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

MAKER System Achieves Million-Step LLM Task with Perfect Accuracy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates