TLDR: DEPTH is a novel AI framework designed to improve relation extraction by Large Language Models (LLMs) and significantly reduce ‘hallucinations’ (incorrect relation predictions). It employs a two-tiered approach: a Grounding module that simplifies sentences using dependency parsing and uses causality-driven reinforcement learning for precise local predictions, and a Refinement module that aggregates these predictions and applies global consistency checks for self-correction. Experiments show DEPTH drastically lowers hallucination rates and improves accuracy across various datasets, making LLM-based relation extraction more reliable for real-world applications.
In the rapidly evolving field of artificial intelligence, Large Language Models (LLMs) have shown immense potential, especially in tasks like relation extraction. Relation extraction is crucial for building structured knowledge bases, which are vital for many applications, from social media analysis to question answering. However, a significant challenge with LLMs in this domain is their tendency to ‘hallucinate’ or incorrectly predict relationships between entities, especially in complex sentences. These false predictions can introduce errors into knowledge graphs, compromising their reliability.
A new framework called DEPTH has been introduced to tackle this problem. DEPTH stands for Dependency-aware sEntence simPlification and Two-tiered Hierarchical refinement. It aims to make relation extraction more accurate and significantly reduce these hallucinated relationships. The framework operates in two main stages: the Grounding module and the Refinement module.
The Grounding Module: Focusing on Local Accuracy
The first stage, the Grounding module, focuses on extracting relations for individual entity pairs within a sentence. It uses a clever technique called Dependency-aware Sentence Simplification. Imagine a complex sentence; the core relationship between two entities might be hidden among many irrelevant words. DEPTH uses ‘dependency parsing’ to identify the shortest path between the two entities in the sentence’s grammatical structure. This shortest path often contains the most crucial information for understanding their relationship. By simplifying the sentence to include only this essential context, DEPTH helps the LLM focus on what truly matters, reducing syntactic noise and making more precise local predictions.
Beyond simplification, the Grounding module also incorporates a novel approach to reinforcement learning with human feedback (RLHF), called Causality-driven Reward Modeling. LLMs can sometimes learn to associate relationships based on superficial patterns, like two words frequently appearing together, rather than genuine semantic understanding. This can lead to systematic hallucinations. DEPTH addresses this by training its ‘reward model’—which guides the LLM’s learning—to ignore these spurious correlations. It does this by carefully separating the truly relevant parts of the input and output from the irrelevant ones during training. This ensures the reward model learns from causal signals, leading to a more robust and reliable LLM that is less prone to hallucinating.
The Refinement Module: Ensuring Global Consistency
While the Grounding module excels at local predictions, treating each entity pair in isolation can sometimes lead to inconsistencies. This is where the second stage, the Refinement module, comes in. It takes all the relations predicted by the Grounding module for a given sentence and aggregates them. Then, it prompts the LLM to review these predictions from a holistic, sentence-level perspective. This ‘self-correction’ mechanism performs three crucial checks:
- Omission Check: Identifies any relationships that might have been missed in the initial local predictions.
- Contradiction Check: Detects and resolves logically inconsistent relations, ensuring the final set of predictions makes sense together.
- Misclassification Check: Corrects errors that might have arisen from the Grounding module’s localized context bias.
By integrating this global view, the Refinement module significantly enhances the overall accuracy and coherence of the extracted relations, further mitigating hallucinations.
Also Read:
- Boosting Information Extraction: A New AI Workflow Combines Language Models with Logic
- Enhancing LLMs: How Causal Reasoning Reduces Hallucinations
Impressive Results and Practical Implications
Experiments conducted on six different benchmarks, including news and scientific domain datasets, demonstrate DEPTH’s remarkable effectiveness. It reduced the average hallucination rate to a mere 7.0% while achieving a significant 17.2% improvement in average F1 score over existing state-of-the-art methods. What’s particularly impressive is that DEPTH achieves this superior performance even with a smaller 14-billion parameter model, outperforming baselines that rely on much larger models. This highlights the efficiency and effectiveness of the framework’s design.
The framework also shows strong generalizability, proving capable of supporting model sharing across datasets within similar domains, which is a crucial step towards cost-effective, real-world deployment. This ability to accurately discern whether a relation exists, rather than just classifying its type, makes DEPTH highly applicable for building high-quality knowledge bases in enterprise-scale document processing, where noisy data can be detrimental.
For more technical details, you can refer to the full research paper: DEPTH: Hallucination-Free Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement.
The DEPTH framework represents a significant step forward in making LLM-based relation extraction more reliable and practical, addressing a critical challenge that has hindered the widespread adoption of LLMs in structured knowledge construction.


