ARUQULA: An LLM Agent for Navigating Knowledge Graphs with Natural Language

TLDR: ARUQULA is an LLM-based agent that translates natural language questions into SPARQL queries for knowledge graphs. It uses an iterative “reason and act” (ReAct) approach with specialized exploration tools and a dual-strategy for semantic grounding. Developed for the TEXT2SPARQL Challenge, it demonstrated efficient query generation and adaptability, while highlighting challenges in automated evaluation of complex queries.

Interacting with complex knowledge graphs can be a significant hurdle for many, especially those without a background in computer science. The specialized query language, SPARQL, often presents a high barrier to entry. This is where large language models (LLMs) come into play, offering a promising solution by translating natural language questions into precise SPARQL queries.

A new research paper introduces ARUQULA, an innovative LLM-based Text2SPARQL approach that leverages the ReAct framework and specialized knowledge graph exploration utilities. Unlike single-shot translation methods, ARUQULA operates as an iterative process of exploration and execution, aiming for greater accuracy and adaptability. The full details of this work can be found in the research paper.

The TEXT2SPARQL Challenge

The development of ARUQULA was motivated by the First International TEXT2SPARQL Challenge, an initiative designed to foster advancements in the Text2SPARQL domain. This challenge required participants to deploy solutions as publicly accessible RESTful web services capable of converting natural language questions into SPARQL queries across various datasets and languages. Systems were evaluated on two primary datasets: DBpedia (DB25), a large-scale, multilingual knowledge graph derived from Wikipedia, and a Corporate Knowledge Graph (CK25), a domain-specific graph simulating realistic enterprise queries. This setup allowed for testing scalability, multilingual robustness, and domain adaptation.

ARUQULA’s Iterative Approach

ARUQULA builds upon the SPINACH agent, generalizing its capabilities to work with RDF graphs beyond Wikidata and adapting it for multilingual and multi-knowledge graph environments. The core of ARUQULA is an LLM-backed agent that uses a “reason and act” (ReAct) approach. This means the LLM doesn’t try to solve the entire task at once but breaks it down into smaller sub-tasks, using a set of tools to interact with the knowledge graph.

The agent employs six key knowledge graph exploration utilities:

search_entity: Finds individual real-world entities (e.g., “Apple”, “Berlin”).
search_property: Locates properties or relationships (e.g., “price”, “hasLocation”).
search_class: Identifies classes or categories (e.g., “Company”, “Book”).
get_knowledgegraph_entry: Retrieves detailed information about a specific entity.
get_property_examples: Provides usage examples for a given property.
execute_sparql: Runs a SPARQL query on the knowledge graph and returns results.

These actions are guided by the LLM, which decides the next step in an iterative process that can involve up to 15 iterations. The system uses GPT-4.1 mini as the LLM, chosen for its balance of cost and performance.

Semantic Grounding: A Dual Strategy

A crucial aspect of Text2SPARQL systems is semantic grounding—accurately mapping natural language phrases to the precise identifiers (IRIs) of classes and properties within the knowledge graph. ARUQULA employs a dual-strategy approach for this:

Hybrid Vector Search for Schema Entities: For conceptual terms (like “population” or “who made this”), where semantic ambiguity is high, ARUQULA uses a hybrid search method with the Qdrant vector store. This combines dense vector embeddings (capturing semantic meaning) and sparse lexical vectors (for keyword matching) to identify the correct schema elements.
Full-Text Search for Named Entity Resolution: For specific proper nouns (like “Berlin” or “Google”), where efficient string matching is key, a highly performant Lucene full-text index is used. This provides fast and lexically precise resolution of named entities to their canonical IRIs.

This dual strategy ensures that the right tool is used for the right job, effectively handling both conceptual and named entity grounding.

Also Read:

Evaluation and Future Directions

The evaluation of ARUQULA in the TEXT2SPARQL challenge provided valuable insights. The agent consistently stayed well within the ten-minute timeout limit, with average query generation times ranging from 51 to 66 seconds across different datasets and languages. The average number of steps taken by the agent varied between 8 and 10. Analysis of the agent’s behavior showed an initial focus on searching entities and classes, followed by exploring properties and entity details, and finally, an increasing rate of SPARQL query execution.

The researchers also identified several challenges in the evaluation process, particularly regarding the ambiguity of natural language questions and the nuances of SPARQL result projections. For instance, questions like “What are the 10 most populated countries?” can have multiple valid SPARQL interpretations, leading to discrepancies with “gold queries” used for scoring. ARUQULA sometimes refined queries to account for data quality issues or schema fuzziness in DBpedia, even if this led to lower scores in the automated evaluation.

Future work for ARUQULA includes improving latency for interactive settings, comparing different LLM models, extending the approach to other knowledge graph domains, and enhancing the automation of setup and deployment. Further evaluation of ontology grounding strategies is also planned to improve performance on large knowledge graphs not present in LLM training data.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

ARUQULA: An LLM Agent for Navigating Knowledge Graphs with Natural Language

The TEXT2SPARQL Challenge

ARUQULA’s Iterative Approach

Semantic Grounding: A Dual Strategy

Evaluation and Future Directions

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

MAKER System Achieves Million-Step LLM Task with Perfect Accuracy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates