TLDR: MetaboT is a new multi-agent framework that uses Large Language Models (LLMs) to enable interactive analysis of mass spectrometry metabolomics knowledge graphs. It translates natural-language questions into precise SPARQL queries, resolves entities accurately, and significantly reduces LLM hallucinations. The system’s modular design, specialized agents, and iterative refinement process allow researchers to explore complex biological data and discover novel compounds without requiring programming expertise. Evaluation shows a substantial increase in query accuracy when using MetaboT’s multi-agent architecture compared to standalone LLMs.
A new framework called MetaboT has been developed to simplify the complex analysis of mass spectrometry metabolomics data. This innovative system uses a multi-agent approach powered by Large Language Models (LLMs) to help researchers navigate vast amounts of biological information without needing specialized programming skills.
Mass spectrometry metabolomics generates incredibly detailed data, but traditional methods often struggle to process it efficiently. This creates a bottleneck in discovering new molecular interactions, metabolic pathways, and potential drug candidates. While knowledge graphs have emerged as a powerful tool for integrating metabolites and biological entities, their complexity and reliance on specialized query languages like SPARQL have limited their widespread use. Existing LLM-based solutions for SPARQL generation often suffer from issues like ‘hallucinations’ (generating incorrect information) and a heavy dependence on detailed schema information.
MetaboT addresses these challenges by breaking down complex metabolomics knowledge graph queries into smaller, manageable subtasks. Each subtask is then handled by a specialized agent, leading to more precise entity resolution, accurate SPARQL query generation, and significantly fewer hallucinations compared to single-LLM systems. This allows researchers to interact with a metabolomics knowledge graph through an intuitive conversational interface.
The framework’s architecture is designed for flexibility, managing entity resolution, query processing, and iterative refinement. When a user asks a natural-language question, an Entry Agent first determines if it’s a new query or a follow-up. New questions go to a Validator Agent, which checks if the question is relevant to the knowledge graph’s schema. For specific questions, like those about plants, a PlantDatabaseChecker tool confirms the plant’s presence in a curated database.
The Supervisor Agent then coordinates tasks, delegating entity resolution to the KG (knowledge graph) Agent. The KG Agent uses specialized tools like ChemicalResolver, SMILESResolver, TargetResolver, and TaxonResolver to convert entities mentioned in the user’s question into standardized identifiers. For example, it can retrieve a Wikidata identifier for a plant species or a ChEMBL identifier for a biological target. These tools leverage external APIs or retrieval-augmented generation (RAG) on local documents, ensuring accuracy and reducing hallucinations.
Once entities are resolved and schema details are gathered, the SPARQL Query Runner Agent constructs a detailed prompt for the GraphSparqlQAChain tool. This tool uses an LLM to generate a SPARQL query, executes it on the knowledge graph, and retrieves the answer. A key feature is MetaboT’s ability to distinguish between query construction errors and genuine data absence, implementing an iterative refinement loop to automatically reformulate queries if initial attempts fail.
If further interpretation of results is needed, the Supervisor Agent calls the Interpreter Agent. This agent can summarize outputs or provide visualizations like bar charts or diagrams, making complex data more understandable. MetaboT has been validated on a large plant dataset using 50 representative queries. Evaluations showed that while GPT-4o alone had an accuracy of 8.16% for complex queries, its integration into the multi-agent framework boosted accuracy to 83.67%. This significant improvement highlights the power of the multi-agent architecture in overcoming the limitations of standalone LLMs for metabolomics knowledge graph exploration.
MetaboT is accessible via a user-friendly web application, designed to prioritize ease of use for researchers. It supports features like file uploads, visualizations, and integration with Metabolomics USIs for spectrum plots. The system’s modular design ensures scalability and adaptability, allowing for straightforward extension to other mass spectrometry-based knowledge graphs and compatibility with different LLM models. While the current demonstration focuses on the Experimental Natural Products Knowledge Graph (ENPKG), its design supports broader application.
Also Read:
- EHR-ChatQA: A New Benchmark for Evaluating Interactive AI Agents in Healthcare Databases
- AI Framework Automates Epidemic Modeling and Scientific Reporting
Despite its strengths, MetaboT has some limitations, including its reliance on high-performance LLMs for optimal accuracy, occasional LLM hallucinations, and the non-deterministic nature of LLM outputs. Currently, it is also restricted to single-graph querying, meaning researchers must manually connect data with external resources. Future developments aim to address these limitations by focusing on automated benchmarking, expanded interoperability with additional knowledge graphs, and integrating emerging AI capabilities for cheminformatics and biomedical hypothesis generation. The goal is to evolve MetaboT into a comprehensive toolbox for mass spectrometry data analysis, bridging the gap between specialized query writing and intuitive conversational interactions, thereby accelerating metabolomics discoveries. You can learn more about this research at the research paper.


