TLDR: This research paper introduces a unified framework (MOWI: Model, Observer, World, Input) to understand hallucinations in large language and vision models. It systematically traces the root causes of these errors across five stages of a model’s lifecycle: training data, architectural design, inference mechanisms, loss optimization, and evaluation practices. The paper reveals that hallucinations are not random but predictable outcomes stemming from issues like data quality, model biases, optimization dynamics, and flawed evaluation metrics. It concludes by proposing future directions for mitigation, emphasizing the need for better data curation, architectural improvements, and more robust evaluation methods to enhance AI system reliability.
Large language models (LLMs) and vision models (LVLMs, TVMs) are becoming integral to many applications, from software development to creative media. However, a significant challenge persists: hallucinations. These are instances where models generate incorrect, inconsistent, or nonsensical outputs, which can spread misinformation and cause real-world harm. A recent research paper delves into this complex issue, aiming to provide a unified understanding of hallucinations across different AI models and modalities.
The paper, titled “Review of Hallucination Understanding in Large Language and Vision Models” by Ho Zheng Yi, Liang Siyuan, and Tao Dacheng, highlights that despite extensive efforts to mitigate hallucinations, our understanding of their underlying causes remains fragmented. Without a coherent framework, solutions risk addressing symptoms rather than the root problems, limiting their effectiveness and generalizability.
A Unified Framework: MOWI
To tackle this fragmentation, the researchers introduce a unified, multi-level framework called MOWI, which stands for Model, Observer, World, and Input. This framework helps characterize both image and text hallucinations across diverse applications:
- Model Level: Hallucinations here stem from the model’s internal errors in understanding the true data distribution. It might generate outputs that are plausible but don’t capture the fine details of real data, or even create content in regions where no real data exists.
- Observer Level: This refers to situations where a model’s output diverges from a human observer’s expectations or beliefs. Even if an output is technically valid, users might perceive it as a hallucination if it lacks nuance, seems idiosyncratic, or doesn’t fulfill the task as expected.
- World Level: These hallucinations arise from the model’s lack of knowledge about the real world, either due to missing data (epistemic uncertainty) or inherent randomness in the world itself (aleatoric uncertainty). This includes inaccessible, classified, or time-sensitive information.
- Input Level: Hallucinations at this level occur when the input provided to the model is sparse, contradictory, or outside the range of data it was trained on. This forces the model to guess, leading to unreliable outputs, especially in interactive conversations.
Tracing the Root Causes
The paper meticulously traces the root causes of hallucinations to five distinct stages in a model’s operational lifecycle:
1. Training Data Factors: The quality and distribution of the data used to train these models play a crucial role. Issues like low frequency of certain concepts, limited diversity of tasks, and structural misalignment in pretraining data can lead to models misunderstanding or misperceiving information. Memorization, where models reproduce specific training data rather than generalizing, can also cause hallucinations, especially with unique or rare data samples. A growing concern is “self-consumption,” where models are inadvertently trained on AI-generated content, risking a loss of alignment with real-world data and potentially leading to more severe hallucinations. Additionally, models often exhibit “directional asymmetries,” predicting associations more reliably in one direction than another, a bias inherited from natural human data.
2. Architectural Limitations: The inherent design of the models can introduce vulnerabilities. “Attention glitches” in transformer models can cause them to misinterpret sequence information, leading to errors. “Autoregressive constraints,” where models predict tokens sequentially, can hinder robust causal reasoning and bidirectional understanding. “Incorrect positional encoding” can undermine a model’s ability to handle long contexts effectively. Finally, “inductive biases”—implicit assumptions hardwired into the architecture—can lead to preferred learning patterns that might not align with desired task objectives, causing issues like uncanny artifacts in images or struggles with complex sequences.
3. Inference Mechanisms: How models are used during deployment also contributes to hallucinations. “Few-shot quality,” where models learn from a few examples in the prompt, is highly sensitive to the quality, diversity, and quantity of these examples. Erroneous or noisy demonstrations can significantly worsen hallucinations. In “multi-agent debates,” where multiple LLMs interact, errors and biases can be amplified, leading to false consensus or uncooperative behaviors. “Exposure bias” arises from the mismatch between training (using perfect ground-truth data) and inference (relying on its own imperfect previous outputs), causing errors to accumulate over time.
4. Loss and Optimization: The process of training and fine-tuning models can introduce vulnerabilities. “Pretraining dynamics” involve complex learning trajectories where factual knowledge can be acquired and partially forgotten in cycles. Poor initialization or unconstrained optimization can hinder effective knowledge acquisition. “Post-training vulnerabilities” include “catastrophic forgetting” during domain fine-tuning, where models lose general capabilities, and “instruction overfitting” during instruction tuning, where models latch onto superficial cues rather than deep understanding. “Reward hacking” in reinforcement learning from human feedback (RLHF) can lead models to exploit reward functions with idiosyncratic outputs, often due to human biases or data sparsity. “Shortcut learning” occurs when models learn simpler, brittle solutions instead of robust abstractions, which might work on benchmarks but fail to generalize.
5. Misleading Evaluations: The way we evaluate models can mask or even reinforce hallucinatory tendencies. “Metric blind spots” mean that widely adopted metrics, like FID for images or perplexity for language, often fail to capture the complexity of modern AI outputs and can misalign with real task performance. “Biased judges,” whether powerful AI models or human annotators, can introduce systematic biases, favoring verbose responses, self-generated content, or superficial markers of authority. This can lead to flawed outputs being undetected or reinforced. Lastly, “test contamination,” where test data inadvertently leaks into training sets, undermines benchmark credibility, creating an illusory sense of progress. You can read the full paper for more details at this link.
Also Read:
- Decoding How AI Understands the World: A Multimodal Perspective
- Causal-Visual Programming: A New Framework for Robust and Interpretable AI Agents
Future Directions for Mitigation
The paper concludes by suggesting several promising avenues for mitigating hallucinations. These include “machine teaching” to empower domain experts in curating data for rare tasks, and “test-time adaptation” to allow models to dynamically update their internal representations during inference. Deeper mechanistic interpretability and hybrid neuro-symbolic AI approaches are also proposed to build more stable and interpretable foundations for generalization. Finally, evaluation reform is crucial, advocating for “red-teaming” strategies to stress-test models and adopting frameworks like Social Choice Theory to integrate diverse user preferences into more inclusive definitions of model reliability.
By offering a unified definition and a comprehensive review of root causes, this research provides a critical foundation for developing more robust and effective solutions to hallucinations, ensuring the reliability of AI systems as they continue to scale and integrate into our daily lives.


