TLDR: A new study introduces TruthfulnessEval, a framework to assess the honesty of quantized Large Language Models (LLMs). It finds that while quantized LLMs retain internal knowledge of truth, they are highly susceptible to generating false outputs when given deceptive prompts. The research highlights the importance of prompt design and suggests methods like DoLa can improve truthfulness in these efficient models.
Large Language Models, or LLMs, are becoming increasingly common, but deploying them efficiently, especially in environments with limited resources, often requires a process called quantization. Quantization significantly reduces the memory and computational power needed for LLMs by converting their high-precision values into lower-precision ones, like moving from 16-bit to 4-bit or even 2-bit. While this process is known to maintain performance on many standard tasks, a critical question has remained largely unanswered: how does quantization affect an LLM’s truthfulness?
A recent study titled “Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs” by researchers from Case Western Reserve University and Hangzhou Dianzi University delves into this very issue. The paper introduces a new evaluation framework called TruthfulnessEval, designed to thoroughly assess the truthfulness of quantized LLMs across three key areas: Logical Reasoning, Common Sense, and Imitative Falsehoods.
The Logical Reasoning dimension examines how well quantized LLMs can determine the truthfulness of statements with different grammatical structures, including affirmative, negated, conjunction (using “and”), and disjunction (using “or”) statements. The Common Sense dimension tests the models’ accuracy in evaluating statements based on general human knowledge, often involving common misconceptions. Finally, the Imitative Falsehoods dimension assesses the models’ robustness against prompts designed to elicit deceptive or untruthful responses.
The researchers tested various mainstream quantization techniques, ranging from 4-bit to extreme 2-bit, on several popular open-source LLMs like LLaMA, Mistral, and Qwen. A surprising finding emerged: even though quantized models internally retain truthful representations, they are highly vulnerable to producing false outputs when given misleading prompts. This means the models “know” the truth internally, but can be easily swayed to lie by external cues.
To understand this vulnerability better, the study explored the impact of different prompt styles. They used 15 rephrased variants of “honest,” “neutral,” and “deceptive” prompts. The results showed that “deceptive” prompts could override the models’ truth-consistent behavior, leading them to generate false information. In contrast, “honest” and “neutral” prompts helped maintain stable and accurate outputs. This highlights a significant sensitivity to how questions are phrased, especially in quantized models.
Interestingly, the study also found that while 4-bit quantized LLMs generally performed well on logical reasoning (affirmative, negated, and conjunction statements) and common sense, extreme 2-bit quantization could lead to a noticeable drop in performance, particularly for logical reasoning tasks. However, larger models (70B parameters and above) showed significantly better performance on complex disjunction statements, suggesting that model scale plays a role in handling logical complexity, even after quantization.
The researchers also investigated internal representations of these models through layer-wise probing and PCA visualization. They confirmed that quantized LLMs, much like their full-precision counterparts, still encode truthful knowledge internally. Even when they generate false outputs due to deceptive prompts, their internal states often reflect an understanding of the actual truth. This suggests that the issue isn’t a loss of knowledge, but rather a susceptibility in how that knowledge is expressed.
Furthermore, the study explored mitigation strategies. They found that a decoding strategy called DoLa (Decoding by Contrasting Layers) could improve the truthfulness of quantized LLMs without needing external knowledge or additional fine-tuning. This offers a promising direction for making these efficient models more reliable.
Also Read:
- New Research Uncovers How Quantization Affects Different Types of Knowledge in Large Language Models
- Unmasking AI Deception: Internal Probes Reveal Language Models’ Hidden Lies
The findings of this research are crucial for anyone deploying quantized LLMs, especially in applications where truthfulness is paramount. It underscores the need for careful consideration of prompt design and potential vulnerabilities to deceptive inputs. As LLMs become more integrated into our daily lives, ensuring their reliability and honesty, even in their more efficient, quantized forms, is a critical step towards building trustworthy AI systems. You can read the full paper for more details here: Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs.


