TLDR: OpenAI has released new research proposing a shift in how AI models are evaluated to reduce “hallucinations” – plausible but false statements. The company suggests incentivizing models to express uncertainty with phrases like “I don’t know” rather than generating incorrect information, arguing that current evaluation methods inadvertently encourage guessing.
OpenAI has unveiled a new research paper that delves into the persistent issue of “hallucinations” in large language models (LLMs), including its advanced GPT-5. These hallucinations are defined as plausible but factually incorrect statements generated by AI, which can mislead users and erode trust. The company asserts that these errors are not mysterious glitches but rather predictable statistical outcomes rooted in current AI training and evaluation methodologies.
The core of OpenAI’s diagnosis is that existing evaluation systems inadvertently foster an “epidemic of penalizing uncertainty.” Most benchmarks measure model performance in a way that prioritizes confident answers, even if incorrect, over an honest admission of not knowing. As one article explains, “Most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.” This creates a scenario where models are rewarded for bluffing, much like a student on a test who guesses to avoid a blank answer, even if the guess is wrong.
According to OpenAI, this misaligned incentive system is a primary driver of hallucinations. The research highlights that even with advancements, models like GPT-5, while less prone to errors than predecessors, still produce confidently wrong answers. The company emphasizes that AI models will never achieve 100 percent accuracy, as some real-world questions are “inherently unanswerable.”
The proposed solution involves a fundamental overhaul of evaluation methods. OpenAI suggests modifying benchmarks to “penalise confident errors more heavily than uncertainty” and to “give credit when a model admits it doesn’t know.” This would mean rewarding abstention – when a model refuses to answer due to uncertainty – over fabrication. The paper demonstrates that allowing a model to express uncertainty, such as a 52% abstention rate, can lead to substantially fewer wrong answers compared to a minimal 1% abstention, even if overall “accuracy” scores might appear lower by traditional metrics.
Also Read:
- OpenAI Integrates Model Behavior Team into Post-Training Division to Refine AI Personality and User Interaction
- OpenAI Anticipates Significant Spending Surge, Projecting $115 Billion Cash Burn by 2029
This shift in incentives aims to fine-tune models to acknowledge their limitations, making them more trustworthy and reliable. By changing the “tests” that drive AI development, OpenAI believes that LLMs can become more dependable partners, which could, in turn, accelerate AI adoption in various sectors, particularly among risk-averse businesses.


