Bridging Knowledge Gaps: How AI Translates Queries Across Different Knowledge Bases

TLDR: A new study explores how Large Language Models (LLMs) can automatically translate SPARQL queries between different Knowledge Graphs like DBpedia and Wikidata. The research found that larger LLMs, combined with structured prompting and explicit mapping of entities and relationships, can achieve high accuracy, significantly reducing the manual effort needed for integrating and querying diverse knowledge sources. The study highlights the potential for LLMs to enhance interoperability between heterogeneous knowledge graphs.

In today’s digital age, information is vast and often stored in highly structured databases known as Knowledge Graphs (KGs). Think of KGs like massive, interconnected encyclopedias where facts are linked together, making them incredibly valuable for artificial intelligence and semantic web technologies. Two prominent examples are DBpedia and Wikidata, which contain immense amounts of interconnected facts about the world.

However, a significant challenge arises when trying to use information across different KGs: they often speak different ‘languages’ in terms of their internal structure and identifiers. SPARQL, the standard query language for KGs, is tightly tied to a specific KG’s schema. This means a query written for DBpedia won’t work on Wikidata without substantial manual changes, creating a bottleneck for seamless knowledge integration.

A recent research paper, titled “Automating SPARQL Query Translations between DBpedia and Wikidata,” investigates a promising solution: using state-of-the-art Large Language Models (LLMs) to automatically translate SPARQL queries between these diverse KG schemas. The study, conducted by Malte Christian Bartels, Debayan Banerjee, and Ricardo Usbeck from Leuphana University of Lüneburg, aims to bridge this critical interoperability gap.

The core idea is to leverage LLMs’ advanced capabilities in understanding complex patterns and generating structured text, much like they can compose SPARQL queries from natural language. By enabling LLMs to translate queries, users and applications could seamlessly query, integrate, and cross-validate information across multiple KGs, broadening access to verified data and enhancing reliability.

To test this, the researchers assembled two benchmarks. The first involved 100 DBpedia-Wikidata queries from the QALD-9-Plus dataset, focusing on general encyclopedic knowledge. The second benchmark contained 100 DBLP queries aligned to OpenAlex, a pair of KGs in the scholarly domain, to test the generalizability of the approach beyond encyclopedic data.

Three open LLMs were selected for evaluation: Llama 3.1-8B Instruct, DeepSeek-R1-Distill-Llama-70B, and Mistral-Large-Instruct-2407. These models were tested using various prompting strategies, including zero-shot (minimal guidance), few-shot (providing examples), and two Chain-of-Thought (CoT) variants (encouraging step-by-step reasoning). A crucial element was providing explicit entity and relationship mapping tables to the LLMs, helping them understand how concepts in one KG relate to another.

The findings were insightful. Performance varied significantly across models and prompting strategies. The largest model, Mistral-Large-Instruct-2407, consistently achieved the highest accuracy. For instance, it reached an impressive 86% accuracy when translating from Wikidata to DBpedia using a Chain-of-Thought approach. This strong performance was replicated in the DBLP to OpenAlex generalization task, achieving similar results with a few-shot setup, highlighting the importance of in-context examples.

Interestingly, translations from Wikidata to DBpedia generally worked much better than the reverse. This asymmetry is likely due to DBpedia’s use of human-readable identifiers, which LLMs might find easier to handle compared to Wikidata’s abstract numeric IDs. The study also identified common error types, with ‘Structural Error’ being the most prevalent, often co-occurring with issues like incorrect entity or property mappings.

This research demonstrates a viable and scalable pathway toward KG interoperability. The key ‘recipe’ for success involves using large-capacity LLMs, employing structured prompting techniques (especially few-shot learning with representative examples), and providing accurate entity and relation mapping tables. The authors suggest that prioritizing human-language friendly identifiers when designing KGs could further simplify translation and improve accuracy.

Also Read:

While the study has limitations, such as the moderate size of benchmarks and being exclusively English-based, it offers a significant step forward. Automating SPARQL query translation can substantially reduce the manual effort required for cross-KG data integration and analysis, fostering broader adoption of linked data principles and enabling more extensive knowledge discovery across diverse platforms. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging Knowledge Gaps: How AI Translates Queries Across Different Knowledge Bases

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates