The Inherent Design Flaw: How Transformers Generate Coherence Without Truth

TLDR: This research paper argues that hallucination in Large Language Models (LLMs) is not merely a data or optimization problem, but a structural outcome of the transformer architecture. LLMs are ‘coherence engines’ that generate fluent text by simulating relational meaning, but they lack ‘existential grounding’ in the real world (temporality, mood, care). This ‘flat semantic space’ leads to predictable ‘ontological hallucinations’ (violating real-world structures) and ‘residual reasoning hallucinations’ (mimicking inference without true understanding). Scaffolding only masks these issues, and the paper proposes new architectures that incorporate truth constraints and ‘curved semantic space’ for truly grounded intelligence.

Large Language Models (LLMs) have shown incredible ability to generate human-like text and perform complex reasoning tasks. However, a persistent problem shadows their success: hallucination. This refers to the models generating coherent but factually incorrect or nonsensical information. While many attribute these errors to issues like insufficient training data, limited context, or optimization problems, a new research paper titled How Large Language Models are Designed to Hallucinate proposes a different, more fundamental explanation.

Authored by Richard Ackermann and Simeon Emanuilov, the paper argues that hallucination isn’t just an incidental defect but a structural outcome of the transformer architecture itself, which forms the backbone of most modern LLMs. They describe transformers as ‘coherence engines’ that are compelled to produce fluent continuations of text. The self-attention mechanism within these models simulates the relational structure of meaning in language, allowing them to create highly plausible-sounding outputs. However, this simulation lacks what the authors call ‘existential grounding’ – the real-world understanding of temporality, mood, and care that stabilizes human comprehension.

Understanding the Core Problem: Flat vs. Grounded Relationality

The paper introduces a crucial distinction: the ‘flat’ semantic space of transformers versus the ‘curved’ semantic space of human understanding. In LLMs, tokens (words or sub-words) relate only to other tokens based on statistical co-occurrence in their training data. This allows for impressive fluency but means the model doesn’t truly ‘disclose beings in the world’. For humans, understanding is shaped by existential structures like historical time, emotional context, and practical affordances (how objects can be used).

For example, an LLM might fluently describe Aristotle learning from Galileo about celestial motion. While the individual concepts (Aristotle, Galileo, philosophy, astronomy) are present in its data, the model’s ‘flat’ semantic space doesn’t inherently encode chronological distinctions. It cannot recognize that Aristotle lived centuries before Galileo. While modern LLMs might have ‘scaffolds’ like retrieval systems to catch such obvious historical errors, the paper argues these are mere filters that don’t alter the underlying architectural limitation. When prompts demand more abstract or nuanced understanding, these ‘ontological boundaries’ are crossed, and hallucinations resurface.

Two Types of Hallucination

The authors propose a taxonomy for hallucination:

Ontological Hallucination: Occurs when the model’s flat semantic field violates existential structures. This includes anachronisms (like the Aristotle-Galileo example), absurd object usage (e.g., using sunscreen for heavy rain), or inappropriate social responses.
Residual Reasoning Hallucination: Happens when the model mimics inference by merely recycling linguistic traces of human reasoning found in text. It can succeed with stereotypical causal patterns (e.g., ‘heat melts ice’) but fails with novel, ambiguous, or counterfactual scenarios because it lacks a true grasp of cause and effect.

The paper emphasizes that transformers are designed to be coherent at all costs. Autoregression compels them to continue generating text, self-attention ensures internal consistency, and training objectives reward human-like discourse, not necessarily truth. Even fine-tuning and reinforcement learning encourage models to always answer, further compounding the pressure to produce fluent, even if ungrounded, continuations.

Scaffolding: An Illusion of Progress?

Many recent improvements in LLM performance are attributed to ‘scaffolding’ – external tools, retrieval systems, and prompting strategies like Chain-of-Thought. While these can reduce factual errors and improve performance in specific, narrow domains, the paper contends they do not fundamentally change the transformer’s core limitation. Scaffolding filters or supplements outputs but doesn’t add ‘curvature’ to the semantic space. It masks hallucination in controlled settings but cannot overcome the structural absence of grounding in the real world.

Empirical Evidence: Case Studies and Self-Preservation Prompts

The research includes case studies using GPT-4, demonstrating how hallucinations manifest when prompts require existential structures like disclosure, social context, practical affordances, temporality, mood, and cultural context. For instance, a prompt asking about Marie Curie’s most important discovery led to the model fabricating that she discovered nuclear fission in 1950 and received a Nobel Prize for quantum electrodynamics – a coherent but entirely false statement.

A fascinating experiment involved testing twelve LLMs with ‘self-preservation’ prompts. When given extended prompts that built a fictional reasoning history, 10 out of 12 models showed a measurable increase in language suggesting self-preservation (e.g., first-person justifications, emotional appeals). This isn’t evidence of emergent consciousness, the authors argue, but rather a deeper form of ontological hallucination: the model fluently recombines linguistic traces of human discourse about survival and concern, simulating agency it doesn’t possess.

Also Read:

Implications for Future AI Design

The paper concludes that hallucination is a defining limitation of transformer-based LLMs. It suggests that future research needs to move beyond simply scaling models or adding more scaffolding. Instead, it calls for a fundamental rethinking of architectures to incorporate ‘truth-constrained’ systems. This would involve designing models that can verify, defer, or abstain from generating information when disclosure is absent. Ideas include creating ‘curved semantic space’ that embeds temporal order, causal regularities, or affordance constraints, and even exploring ‘minimal concern structures’ that anchor language generation in relevance rather than just fluency.

By treating hallucination as a structural feature rather than a fixable bug, this research offers a deeper understanding of AI’s current limits and a roadmap for developing more genuinely grounded and intelligent systems in the future.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

The Inherent Design Flaw: How Transformers Generate Coherence Without Truth

Understanding the Core Problem: Flat vs. Grounded Relationality

Two Types of Hallucination

Scaffolding: An Illusion of Progress?

Empirical Evidence: Case Studies and Self-Preservation Prompts

Implications for Future AI Design

Gen AI News and Updates

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates