spot_img
HomeResearch & DevelopmentARUQULA: An LLM Agent for Navigating Knowledge Graphs with...

ARUQULA: An LLM Agent for Navigating Knowledge Graphs with Natural Language

TLDR: ARUQULA is an LLM-based agent that translates natural language questions into SPARQL queries for knowledge graphs. It uses an iterative “reason and act” (ReAct) approach with specialized exploration tools and a dual-strategy for semantic grounding. Developed for the TEXT2SPARQL Challenge, it demonstrated efficient query generation and adaptability, while highlighting challenges in automated evaluation of complex queries.

Interacting with complex knowledge graphs can be a significant hurdle for many, especially those without a background in computer science. The specialized query language, SPARQL, often presents a high barrier to entry. This is where large language models (LLMs) come into play, offering a promising solution by translating natural language questions into precise SPARQL queries.

A new research paper introduces ARUQULA, an innovative LLM-based Text2SPARQL approach that leverages the ReAct framework and specialized knowledge graph exploration utilities. Unlike single-shot translation methods, ARUQULA operates as an iterative process of exploration and execution, aiming for greater accuracy and adaptability. The full details of this work can be found in the research paper.

The TEXT2SPARQL Challenge

The development of ARUQULA was motivated by the First International TEXT2SPARQL Challenge, an initiative designed to foster advancements in the Text2SPARQL domain. This challenge required participants to deploy solutions as publicly accessible RESTful web services capable of converting natural language questions into SPARQL queries across various datasets and languages. Systems were evaluated on two primary datasets: DBpedia (DB25), a large-scale, multilingual knowledge graph derived from Wikipedia, and a Corporate Knowledge Graph (CK25), a domain-specific graph simulating realistic enterprise queries. This setup allowed for testing scalability, multilingual robustness, and domain adaptation.

ARUQULA’s Iterative Approach

ARUQULA builds upon the SPINACH agent, generalizing its capabilities to work with RDF graphs beyond Wikidata and adapting it for multilingual and multi-knowledge graph environments. The core of ARUQULA is an LLM-backed agent that uses a “reason and act” (ReAct) approach. This means the LLM doesn’t try to solve the entire task at once but breaks it down into smaller sub-tasks, using a set of tools to interact with the knowledge graph.

The agent employs six key knowledge graph exploration utilities:

  • search_entity: Finds individual real-world entities (e.g., “Apple”, “Berlin”).
  • search_property: Locates properties or relationships (e.g., “price”, “hasLocation”).
  • search_class: Identifies classes or categories (e.g., “Company”, “Book”).
  • get_knowledgegraph_entry: Retrieves detailed information about a specific entity.
  • get_property_examples: Provides usage examples for a given property.
  • execute_sparql: Runs a SPARQL query on the knowledge graph and returns results.

These actions are guided by the LLM, which decides the next step in an iterative process that can involve up to 15 iterations. The system uses GPT-4.1 mini as the LLM, chosen for its balance of cost and performance.

Semantic Grounding: A Dual Strategy

A crucial aspect of Text2SPARQL systems is semantic grounding—accurately mapping natural language phrases to the precise identifiers (IRIs) of classes and properties within the knowledge graph. ARUQULA employs a dual-strategy approach for this:

  • Hybrid Vector Search for Schema Entities: For conceptual terms (like “population” or “who made this”), where semantic ambiguity is high, ARUQULA uses a hybrid search method with the Qdrant vector store. This combines dense vector embeddings (capturing semantic meaning) and sparse lexical vectors (for keyword matching) to identify the correct schema elements.
  • Full-Text Search for Named Entity Resolution: For specific proper nouns (like “Berlin” or “Google”), where efficient string matching is key, a highly performant Lucene full-text index is used. This provides fast and lexically precise resolution of named entities to their canonical IRIs.

This dual strategy ensures that the right tool is used for the right job, effectively handling both conceptual and named entity grounding.

Also Read:

Evaluation and Future Directions

The evaluation of ARUQULA in the TEXT2SPARQL challenge provided valuable insights. The agent consistently stayed well within the ten-minute timeout limit, with average query generation times ranging from 51 to 66 seconds across different datasets and languages. The average number of steps taken by the agent varied between 8 and 10. Analysis of the agent’s behavior showed an initial focus on searching entities and classes, followed by exploring properties and entity details, and finally, an increasing rate of SPARQL query execution.

The researchers also identified several challenges in the evaluation process, particularly regarding the ambiguity of natural language questions and the nuances of SPARQL result projections. For instance, questions like “What are the 10 most populated countries?” can have multiple valid SPARQL interpretations, leading to discrepancies with “gold queries” used for scoring. ARUQULA sometimes refined queries to account for data quality issues or schema fuzziness in DBpedia, even if this led to lower scores in the automated evaluation.

Future work for ARUQULA includes improving latency for interactive settings, comparing different LLM models, extending the approach to other knowledge graph domains, and enhancing the automation of setup and deployment. Further evaluation of ontology grounding strategies is also planned to improve performance on large knowledge graphs not present in LLM training data.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -