spot_img
HomeResearch & DevelopmentStructured Answers: Interpretable QA Using Knowledge Graphs

Structured Answers: Interpretable QA Using Knowledge Graphs

TLDR: This paper introduces a question answering system that uses knowledge graphs exclusively, bypassing traditional retrieval-augmented generation (RAG) with large language models to enhance interpretability and reduce hallucination. The system generates QA pairs from documents, converts them into a knowledge graph, and then performs graph-based retrieval, reranking, and paraphrasing to produce answers. Evaluated on the CRAG benchmark using LLM-as-a-judge, it achieved competitive accuracies, demonstrating a practical alternative for domains requiring factual consistency and transparency.

Question Answering (QA) systems are designed to provide precise and relevant answers to user queries. Traditionally, many modern QA systems rely on Retrieval-Augmented Generation (RAG), where a user’s query retrieves relevant text chunks that are then processed by a large language model (LLM) to generate an answer. However, these RAG systems often depend heavily on unstructured documents and can be prone to issues like hallucination and limited transparency.

A recent research paper, titled “Interpretable Question Answering with Knowledge Graphs,” proposes an alternative approach that operates exclusively on a knowledge graph retrieval system, without using RAG with LLMs. This method aims to offer more interpretable reasoning and contextual associations, making it particularly suitable for tasks where factual consistency and traceability are crucial, such as in legal or technical fields.

The system, developed by Kartikeya Aneja, Manasvi Srivastava, Subhayan Das, and Nagender Aneja, uses a small paraphraser model to rephrase the entity-relationship edges retrieved from the knowledge graph, generating human-readable answers.

How the System Works

The proposed pipeline is divided into two main stages. The first stage involves pre-processing a document to generate sets of question-answer (QA) pairs. For PDF documents, text is divided into semantically meaningful units, and a Hugging Face model (iarfmoose/t5-base-question-generator) is used to automatically generate QA pairs, extracting atomic facts and knowledge.

The second stage converts these QA pairs into a structured knowledge graph. This is achieved using LangChain’s LLMGraphTransformer with GPT-3.5 Turbo, which identifies entities and their semantic relationships. These nodes (entities) and edges (relationships) are then stored in a Neo4j graph database. Embeddings are generated for each node and node type using the text-embedding-3-large model, enhancing the graph’s search capabilities.

Retrieval and Answer Generation

When a user query is presented, the system employs a three-step retrieval process:

  • Node-Level Semantic Matching: It computes the cosine similarity between the input question’s embedding and the embeddings of candidate nodes in the knowledge graph, selecting highly similar nodes and their immediate relationships.
  • Type-Level Generalization Retrieval: The system identifies the node type most semantically similar to the question, retrieving all associated nodes and relationships. This is useful for general or class-level questions.
  • Fuzzy Entity Matching: Entities explicitly mentioned in the question are identified using a lightweight Hugging Face model (dslim/bert-base-NER), and fuzzy matching is performed against graph nodes, allowing for minor variations or misspellings.

Following retrieval, the selected nodes and relationships are passed to a lightweight paraphrasing model (tuner007/pegasus_paraphrase) to synthesize a fluent natural language response. These paraphrased answers are then sent to a reranking model (BAAI/bge-reranker-large), which ranks them based on semantic relevance to the input question. The top five highest-ranked responses are chosen as the final answers.

Also Read:

Evaluation and Results

The system’s performance was evaluated using an LLM-as-a-judge approach, employing Llama-3.2 and GPT-3.5 Turbo as automatic evaluators. On a custom PDF dataset, the system achieved accuracies of 89.6% with Llama-3.2 and 78.3% with GPT-3.5 Turbo. When tested on the CRAG benchmark, which includes 2398 valid question-answer pairs, the system achieved accuracies of 71.9% with Llama-3.2 and 54.4% with GPT-3.5 Turbo, demonstrating competitive performance compared to existing RAG approaches.

Robustness was also assessed by perturbing questions while preserving their original intent. For the PDF dataset, the accuracy for perturbed questions was 85.6% with Llama-3.2 and 72.6% with GPT-3.5 Turbo, indicating the system’s ability to handle variations in queries.

This research highlights the practical utility of a knowledge graph-based approach for question answering, providing interpretable, ranked responses that can include partially correct or contextually relevant information. This makes it valuable for exploratory QA, entity discovery, and knowledge validation. The code for the experiment, methods, and dataset is publicly available for further exploration. You can read the full research paper here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -