TLDR: HARP is a novel framework for detecting hallucinations in Large Language Models (LLMs). It proposes decomposing an LLM’s hidden state space into a semantic subspace (linguistic expression) and a reasoning subspace (internal thought processes). By applying Singular Value Decomposition (SVD) to the Unembedding layer, HARP identifies basis vectors for these subspaces. It then projects hidden states onto the reasoning subspace, using these projections as compact and robust features for hallucination detection. This method significantly reduces feature dimensionality, filters noise, and achieves state-of-the-art detection accuracy and robustness across various datasets and LLM architectures.
Large Language Models (LLMs) have revolutionized many aspects of natural language processing, demonstrating impressive generative capabilities. However, a significant challenge remains: hallucinations. These are instances where LLMs generate information that is inconsistent with objective facts, posing a major barrier to their reliable use in critical applications.
Existing methods for detecting these hallucinations have made progress, but they often struggle to clearly separate the linguistic expression (semantic information) from the internal thought processes (reasoning information) within an LLM. They also face challenges in maintaining robustness across different scenarios.
Introducing HARP: A Novel Approach to Hallucination Detection
To tackle these issues, researchers Junjie Hu, Gang Tu, ShengYu Cheng, Jinxin Li, Jinting Wang, Rui Chen, Zhilong Zhou, and Dongbo Shan have proposed a new framework called HARP, which stands for HAllucination detection via Reasoning subspace Projection. HARP introduces a groundbreaking idea: the hidden state space of LLMs—essentially, the internal representations the model uses—can be broken down into two distinct parts: a semantic subspace and a reasoning subspace.
The semantic subspace is where the model encodes linguistic expressions, like the words and grammar it uses. The reasoning subspace, on the other hand, captures the LLM’s internal reasoning processes, the ‘thinking’ behind its answers. HARP demonstrates that a specific part of the LLM, known as the Unembedding layer, can effectively separate these two types of information.
How HARP Works
The core of HARP’s methodology involves applying a mathematical technique called Singular Value Decomposition (SVD) to the parameters of the Unembedding layer. This process helps to identify the fundamental ‘basis vectors’ that define both the semantic and reasoning subspaces. Think of these basis vectors as the primary directions or components that make up each subspace.
Once these basis vectors are identified, HARP takes the LLM’s hidden states (its internal representations at various stages) and projects them onto the basis vectors of the reasoning subspace. The resulting projections are then used as input features for a dedicated hallucination detection system. This projection step is crucial because it significantly reduces the dimensionality of the features—to approximately 5% of the original size—while filtering out most of the irrelevant noise. This leads to enhanced robustness and accuracy in detection.
The researchers draw an analogy to human cognition: when humans answer complex questions, they typically reason first and then express their thoughts. Similarly, HARP emphasizes the reasoning information within LLMs’ hidden states, rather than just the semantic content of their outputs, to achieve high-precision hallucination detection.
Impressive Results and Robustness
Experiments conducted across multiple datasets, including NQ Open, TruthfulQA, TriviaQA, and TyDiQA-GP, show that HARP achieves state-of-the-art performance in hallucination detection. For example, on the TriviaQA dataset, HARP achieved an AUROC (Area Under the Receiver Operating Characteristic curve) of 92.8% with the Qwen-2.5-7B-Instruct model, outperforming the previous best method by a significant 7.5%. Similar strong results were observed with the LLaMA-3.1-8B model.
HARP consistently outperformed other baseline methods, even on complex datasets with long contexts, where other methods struggled. Furthermore, HARP’s single-pass approach offers superior efficiency compared to sampling-based methods that incur higher computational costs.
The study also confirmed the rationality of the direct sum decomposition of the hidden state space and the necessity of the projection operation. HARP demonstrated strong robustness and cross-distribution adaptability, generalizing well even when trained on one dataset and evaluated on another.
Also Read:
- Detecting AI Hallucinations Through Internal Reasoning
- New Technique Improves Detection and Reduction of LLM Hallucinations
Conclusion
HARP represents a significant advancement in making LLMs more reliable by providing an effective and robust method for detecting hallucinations. By focusing on the reasoning processes within LLMs and leveraging the power of subspace projection, HARP paves the way for more trustworthy AI applications. For more in-depth technical details, you can read the full research paper here.


