spot_img
HomeResearch & DevelopmentCleanse: A Clustering-Based Approach to Detect Hallucinations in Large...

Cleanse: A Clustering-Based Approach to Detect Hallucinations in Large Language Models

TLDR: Cleanse is a new uncertainty estimation method for Large Language Models (LLMs) that helps detect hallucinations (inaccurate responses). It works by generating multiple LLM outputs, converting them into semantic embeddings, and then clustering these embeddings based on their meaning. By analyzing the consistency within and between these semantic clusters, Cleanse calculates a score that indicates the LLM’s confidence. Experiments show Cleanse outperforms existing methods in identifying hallucinations across various LLMs and datasets, emphasizing the importance of semantic consistency for reliable AI.

Large Language Models (LLMs) have revolutionized many aspects of natural language processing, from generating human-like text to answering complex questions. However, a significant challenge persists: the phenomenon of ‘hallucination.’ This is when an LLM produces responses that sound plausible and coherent but are factually incorrect or unsupported by real knowledge. Such inaccuracies can severely undermine trust and lead to serious consequences, especially in critical applications like medical diagnosis or legal advice.

Addressing these hallucinations is crucial for building safe and reliable AI systems. Researchers have explored various solutions, including refining training datasets and using Retrieval-Augmented Generation (RAG) to fetch external knowledge. While effective, these methods can be labor-intensive, computationally demanding, or require complex system architectures.

Introducing Cleanse: A Novel Approach to Uncertainty Estimation

A more lightweight and scalable alternative is ‘uncertainty estimation,’ which involves assessing how confident an LLM is in its own outputs. The idea is that if a model is uncertain, its responses might be unreliable. This is where a new approach called ‘Cleanse’ (Clustering-based Semantic Consistency) comes into play. Cleanse offers a novel way to quantify this uncertainty, helping to distinguish between correct and incorrect answers more clearly.

Cleanse operates on the principle that when an LLM is confident, its multiple generated responses to the same query will be semantically consistent, meaning they convey the same core meaning even if phrased differently. Conversely, a lack of confidence often leads to highly variable and semantically divergent outputs.

How Cleanse Works Under the Hood

The Cleanse pipeline involves a few key steps:

First, for a given query, the LLM generates multiple responses. These responses are then converted into ‘hidden embeddings,’ which are numerical representations that capture the deep semantic information of the text.

The innovative part of Cleanse is its clustering technique. It groups these hidden embeddings based on their semantic equivalence. To ensure true semantic consistency, Cleanse uses a fine-tuned Natural Language Inference (NLI) model. This model checks for ‘bi-directional entailment,’ meaning two outputs are considered semantically equivalent only if each one logically implies the other. This rigorous approach helps form precise clusters of truly consistent responses.

Once clustered, Cleanse calculates a ‘Cleanse Score’ to quantify uncertainty. This score is derived from two types of similarities:

  • Intra-cluster similarity: The sum of similarities between embeddings *within* the same cluster. High intra-cluster similarity indicates strong semantic agreement.
  • Inter-cluster similarity: The sum of similarities between embeddings *across* different clusters. High inter-cluster similarity suggests semantic divergence and inconsistency.

The Cleanse Score essentially measures the proportion of intra-cluster similarity relative to the total similarity of all generated outputs. A high Cleanse Score indicates low uncertainty and high consistency (meaning most outputs fall into a few, tightly-knit semantic clusters). A low score, conversely, signals high uncertainty and potential hallucination (many scattered clusters, indicating diverse and inconsistent meanings).

Validation and Performance

The effectiveness of Cleanse was rigorously tested using four popular LLMs (LLaMA-7B, LLaMA-13B, LLaMA2-7B, and Mistral-7B) and two standard question-answering datasets (SQuAD and CoQA). The results demonstrated that Cleanse consistently outperformed existing uncertainty estimation methods, including perplexity, length-normalized entropy, and lexical similarity. Notably, Cleanse showed a significant advantage in detecting hallucinations, especially when correctness was measured under stricter conditions, highlighting its robustness for tasks requiring high precision.

The research also emphasized that focusing on the semantic aspect of language, rather than just token-level or lexical forms, is crucial for accurate uncertainty estimation. Furthermore, the choice of the clustering model (specifically, ‘nli-deberta-v3-base’) was found to be a critical factor, as a well-performing clustering model leads to a clearer distinction between correct and incorrect generations.

Also Read:

Looking Ahead

While Cleanse currently requires access to the LLM’s internal ‘hidden embeddings’ (making it a ‘white-box’ approach), the researchers suggest that future work could explore using other types of output vector embeddings to overcome this limitation. This would broaden its applicability to ‘black-box’ LLMs where internal states are not accessible.

Cleanse represents a significant step forward in making LLMs more reliable by providing an effective and interpretable way to estimate their uncertainty and detect hallucinations. For more technical details, you can refer to the full research paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -