Unpacking Language Models: Do Transformers Model Human Thought or the Fabric of Language Itself?

TLDR: Colin Klein’s research paper argues that Large Language Models (LLMs) model the corpus they are trained on, not human linguistic capacities. He posits that human language relies on ‘supralinear formats’ for computation, while transformers, due to their ‘permutation’ and ‘substring’ invariants, are limited to ‘linear formats’. The paper suggests LLMs achieve their performance by learning ‘shortcut solutions’ to emulate ‘finite state automata’ that could produce the training corpus. This leads to a non-deflationary conclusion: LLMs reveal the power of language as a ‘discourse machine’ that facilitates systematic transformations, which both humans and AI learn to leverage, albeit through different mechanisms.

Large Language Models (LLMs) have captivated the world with their ability to generate human-like text, leading many to wonder: what exactly are these powerful AI systems truly modeling? Do they reflect human cognitive abilities, or are they simply sophisticated mirrors of the vast text data they are trained on? A recent research paper by Colin Klein, titled “WHAT DO LANGUAGE MODELS MODEL ? T RANSFORMERS , AUTOMATA , AND THE FORMAT OF THOUGHT .”, delves into this fundamental question, offering a compelling argument that LLMs primarily model the corpus, rather than human linguistic capacities, yet arriving at a surprisingly non-deflationary conclusion.

The paper begins by outlining two main perspectives on LLM modeling. One view suggests that LLMs model human linguistic capacity, implying they possess or simulate cognitive abilities like grounded symbols, instrumental knowledge, or even a theory of mind. The other, often more ‘deflationary’ view, posits that LLMs merely model a collection of text, acting as ‘stochastic parrots’ or imperfect compressions of their training data. Klein firmly aligns with the latter, but with a nuanced, positive interpretation.

Central to Klein’s argument is the concept of the ‘format of thought’ – how information is structured and processed. Cognitive science, he notes, suggests that human linguistic capabilities rely on ‘supralinear formats’ for computation. Imagine a complex web or a tree structure where elements can be connected in multiple, non-sequential ways. This allows for the rich, hierarchical processing characteristic of human language, where meaning can depend on distant relationships within a sentence, not just adjacent words.

In contrast, Klein argues that the transformer architecture, the backbone of modern LLMs, supports at best ‘linear formats’ for processing. Think of a simple list or a sequence where elements are ordered one after another. This limitation stems from certain inherent ‘invariants’ of the transformer’s computational architecture, particularly within its ‘residual stream’ – the data pathway where all the computational action happens.

Two key invariants are discussed: ‘permutation invariance’ and ‘substring invariance’. Permutation invariance, found in unmasked transformers, means that the core operations are indifferent to the order of input tokens; positional encodings are added to explicitly provide order information. Substring invariance, characteristic of masked transformers (like those used for text generation), dictates that operations on an initial part of a sequence are unaffected by later tokens. This means that later information cannot influence how earlier parts of a sentence are processed, which is crucial for building complex, interdependent structures like syntactic trees where later words can disambiguate earlier ones.

These invariants, Klein contends, strongly suggest that transformers are limited to processing information in a linear fashion. While they can represent facts about language and its structure, this is seen as a property of the ‘content’ of their representations, not their underlying ‘format’. This implies that LLMs are not processing language in the same way humans do.

So, if LLMs don’t model human linguistic capacity, what positive story can be told about their impressive performance? Klein turns to the idea of ‘shortcut automata’, building on speculations by Liu et al. (2022). He suggests that transformers are not primarily concerned with language itself, but rather with calculating the ‘input-to-state function’ of an automaton that could have produced the training corpus. Essentially, they learn efficient ‘shortcut solutions’ to emulate the step-by-step transitions of a finite state automaton that generates text.

This perspective, rooted in the Krohn-Rhodes theorem from automata theory, explains how a linear, parallel feedforward architecture can handle autoregressive tasks. The transformer learns to predict the next state of this hypothetical ‘corpus-producing automaton’ given the current input, effectively learning the permissible transformations within the corpus. This is why LLMs excel at generating text that sounds like a continuation of existing text – they’ve learned the ‘rules’ of the ‘discourse machine’ inherent in the training data.

Klein concludes that this is far from a deflationary view. Language, he argues, is not just a means of expressing inner states, but also a powerful ‘discourse machine’ – a technology that allows us to systematically transform bits of language given appropriate context. Humans learn to use this technology in one way, often implicitly, by being exposed to vast amounts of text and learning how to produce more of the same. LLMs, through their unique architecture and training, have also learned to use this discourse machine, but via very different, linear means.

Also Read:

The paper ultimately suggests that LLMs offer profound insights not into human linguistic production, but into the inherent power and structure of the corpus itself. They demonstrate that an ability to learn transformations within a linear framework is sufficient to capture the essence of a language corpus and generate more like it, highlighting the corpus as a ‘discourse machine’ that shapes both human and artificial language generation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking Language Models: Do Transformers Model Human Thought or the Fabric of Language Itself?

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

A New Way to Disentangle Data for Scientific Exploration

SiegPath Honored with ‘Most Innovative Fintech Award’ at AI Expo Europe 2025 for AI-Driven Solutions

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates