TLDR: This research explores how Large Language Models (LLMs) can make complex manufacturing Knowledge Graphs (KGs) easier to use by automatically translating natural language questions into technical queries (SPARQL). The study found that providing LLMs with specific, relevant context from the KG significantly improves the accuracy of these translations, reducing errors and making KGs more accessible for non-experts in manufacturing settings.
Knowledge Graphs (KGs) have become incredibly important for managing data in the manufacturing industry. They help integrate different data sources by providing a shared, structured way to organize information. However, using these powerful KGs can be challenging for people who aren’t experts, often requiring them to write complex queries in a language called SPARQL to get specific information.
The rise of Large Language Models (LLMs) offers a promising solution. LLMs have the potential to automatically translate everyday language questions into SPARQL queries, effectively bridging the gap between user-friendly interfaces and the sophisticated structure of KGs. The main hurdle is effectively informing LLMs about the specific context and structure of KGs in specialized fields like manufacturing, which is crucial for generating accurate queries.
This research paper evaluates various strategies for using LLMs as intermediaries to make information retrieval from KGs easier, focusing specifically on the manufacturing domain. The study looked at two key KGs: the Bosch Line Information System KG and the I40 Core Information Model. The researchers compared different methods for providing relevant context from the KG to the LLM and analyzed how well these methods helped transform real-world questions into SPARQL queries.
The findings are significant: LLMs can dramatically improve their performance in generating correct and complete queries when they are given only the appropriate context from the KG’s structure. These ‘context-aware prompting’ techniques help LLMs focus on the relevant parts of the data structure and reduce the risk of generating incorrect or made-up information, a phenomenon known as hallucination. The authors believe these techniques will help make complex data repositories more accessible and empower better decision-making in manufacturing environments.
The framework proposed in the paper involves two main steps: ‘Preprocessing and Enrichment’ of the Knowledge Graph, and ‘Prompting’ the Large Language Model. In the preprocessing stage, relevant content from the KG is selected. This can involve using the entire data structure, a simplified version, or a context-based reduction that only includes information relevant to the user’s question. This selected content can then be further enriched with additional details from the ontology, the LLM itself, or external sources. Finally, this structured KG content needs to be converted into a format that LLMs can understand, such as a graph, table, or text structure.
For the LLM prompting step, the choice of the LLM model (e.g., GPT-3.5 or GPT-4) and how the prompt is engineered are critical. The study explored different prompting techniques: a simple prompt, a prompt with a generic example of a SPARQL query, and a prompt with a domain-specific example. The evaluation used a manufacturing benchmark of 17 business-relevant questions derived from real-world scenarios, assessing both the accuracy of the generated SPARQL terms (to detect hallucinations) and the correctness and completeness of the queries as rated by human experts.
The quantitative results showed that context-based reduction of the ontology significantly improved the LLM’s accuracy in generating correct SPARQL terms, reducing hallucinations. For instance, GPT-4 generally outperformed GPT-3.5 in this task. While the representation format (graph, table, table-sorted) and simple prompting techniques had less impact on hallucination rates, the qualitative evaluation highlighted that reducing the ontology to only relevant concepts substantially increased the correctness and completeness of the generated queries. Providing domain-specific examples in the prompt also led to noticeable improvements.
Also Read:
- Bridging Logic and Language: A New Approach to Knowledge Graph Completion with LLMs
- Empowering Supply Chain Decisions with Large Language Models
The research acknowledges that the non-deterministic nature of LLMs presents a challenge for consistent performance. Future work aims to handle even more complex questions, address this unpredictability, and explore the use of open-source language models. This work represents a significant step towards democratizing access to complex manufacturing data, making it easier for non-experts to retrieve valuable insights using natural language. You can read the full paper here: Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting.


