Cleanse: A Clustering-Based Approach to Detect Hallucinations in Large Language Models

TLDR: Cleanse is a new uncertainty estimation method for Large Language Models (LLMs) that helps detect hallucinations (inaccurate responses). It works by generating multiple LLM outputs, converting them into semantic embeddings, and then clustering these embeddings based on their meaning. By analyzing the consistency within and between these semantic clusters, Cleanse calculates a score that indicates the LLM’s confidence. Experiments show Cleanse outperforms existing methods in identifying hallucinations across various LLMs and datasets, emphasizing the importance of semantic consistency for reliable AI.

Large Language Models (LLMs) have revolutionized many aspects of natural language processing, from generating human-like text to answering complex questions. However, a significant challenge persists: the phenomenon of ‘hallucination.’ This is when an LLM produces responses that sound plausible and coherent but are factually incorrect or unsupported by real knowledge. Such inaccuracies can severely undermine trust and lead to serious consequences, especially in critical applications like medical diagnosis or legal advice.

Addressing these hallucinations is crucial for building safe and reliable AI systems. Researchers have explored various solutions, including refining training datasets and using Retrieval-Augmented Generation (RAG) to fetch external knowledge. While effective, these methods can be labor-intensive, computationally demanding, or require complex system architectures.

Introducing Cleanse: A Novel Approach to Uncertainty Estimation

A more lightweight and scalable alternative is ‘uncertainty estimation,’ which involves assessing how confident an LLM is in its own outputs. The idea is that if a model is uncertain, its responses might be unreliable. This is where a new approach called ‘Cleanse’ (Clustering-based Semantic Consistency) comes into play. Cleanse offers a novel way to quantify this uncertainty, helping to distinguish between correct and incorrect answers more clearly.

Cleanse operates on the principle that when an LLM is confident, its multiple generated responses to the same query will be semantically consistent, meaning they convey the same core meaning even if phrased differently. Conversely, a lack of confidence often leads to highly variable and semantically divergent outputs.

How Cleanse Works Under the Hood

The Cleanse pipeline involves a few key steps:

First, for a given query, the LLM generates multiple responses. These responses are then converted into ‘hidden embeddings,’ which are numerical representations that capture the deep semantic information of the text.

The innovative part of Cleanse is its clustering technique. It groups these hidden embeddings based on their semantic equivalence. To ensure true semantic consistency, Cleanse uses a fine-tuned Natural Language Inference (NLI) model. This model checks for ‘bi-directional entailment,’ meaning two outputs are considered semantically equivalent only if each one logically implies the other. This rigorous approach helps form precise clusters of truly consistent responses.

Once clustered, Cleanse calculates a ‘Cleanse Score’ to quantify uncertainty. This score is derived from two types of similarities:

Intra-cluster similarity: The sum of similarities between embeddings *within* the same cluster. High intra-cluster similarity indicates strong semantic agreement.
Inter-cluster similarity: The sum of similarities between embeddings *across* different clusters. High inter-cluster similarity suggests semantic divergence and inconsistency.

The Cleanse Score essentially measures the proportion of intra-cluster similarity relative to the total similarity of all generated outputs. A high Cleanse Score indicates low uncertainty and high consistency (meaning most outputs fall into a few, tightly-knit semantic clusters). A low score, conversely, signals high uncertainty and potential hallucination (many scattered clusters, indicating diverse and inconsistent meanings).

Validation and Performance

The effectiveness of Cleanse was rigorously tested using four popular LLMs (LLaMA-7B, LLaMA-13B, LLaMA2-7B, and Mistral-7B) and two standard question-answering datasets (SQuAD and CoQA). The results demonstrated that Cleanse consistently outperformed existing uncertainty estimation methods, including perplexity, length-normalized entropy, and lexical similarity. Notably, Cleanse showed a significant advantage in detecting hallucinations, especially when correctness was measured under stricter conditions, highlighting its robustness for tasks requiring high precision.

The research also emphasized that focusing on the semantic aspect of language, rather than just token-level or lexical forms, is crucial for accurate uncertainty estimation. Furthermore, the choice of the clustering model (specifically, ‘nli-deberta-v3-base’) was found to be a critical factor, as a well-performing clustering model leads to a clearer distinction between correct and incorrect generations.

Also Read:

Looking Ahead

While Cleanse currently requires access to the LLM’s internal ‘hidden embeddings’ (making it a ‘white-box’ approach), the researchers suggest that future work could explore using other types of output vector embeddings to overcome this limitation. This would broaden its applicability to ‘black-box’ LLMs where internal states are not accessible.

Cleanse represents a significant step forward in making LLMs more reliable by providing an effective and interpretable way to estimate their uncertainty and detect hallucinations. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Cleanse: A Clustering-Based Approach to Detect Hallucinations in Large Language Models

Introducing Cleanse: A Novel Approach to Uncertainty Estimation

How Cleanse Works Under the Hood

Validation and Performance

Looking Ahead

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Dremio Launches ‘The Agentic Lakehouse’ for AI-Driven Data Management

LinkedIn Revolutionizes People Search with Generative AI for 1.3 Billion Users

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates