spot_img
HomeResearch & DevelopmentUnpacking Language Models: Do Transformers Model Human Thought or...

Unpacking Language Models: Do Transformers Model Human Thought or the Fabric of Language Itself?

TLDR: Colin Klein’s research paper argues that Large Language Models (LLMs) model the corpus they are trained on, not human linguistic capacities. He posits that human language relies on ‘supralinear formats’ for computation, while transformers, due to their ‘permutation’ and ‘substring’ invariants, are limited to ‘linear formats’. The paper suggests LLMs achieve their performance by learning ‘shortcut solutions’ to emulate ‘finite state automata’ that could produce the training corpus. This leads to a non-deflationary conclusion: LLMs reveal the power of language as a ‘discourse machine’ that facilitates systematic transformations, which both humans and AI learn to leverage, albeit through different mechanisms.

Large Language Models (LLMs) have captivated the world with their ability to generate human-like text, leading many to wonder: what exactly are these powerful AI systems truly modeling? Do they reflect human cognitive abilities, or are they simply sophisticated mirrors of the vast text data they are trained on? A recent research paper by Colin Klein, titled “WHAT DO LANGUAGE MODELS MODEL ? T RANSFORMERS , AUTOMATA , AND THE FORMAT OF THOUGHT .”, delves into this fundamental question, offering a compelling argument that LLMs primarily model the corpus, rather than human linguistic capacities, yet arriving at a surprisingly non-deflationary conclusion.

The paper begins by outlining two main perspectives on LLM modeling. One view suggests that LLMs model human linguistic capacity, implying they possess or simulate cognitive abilities like grounded symbols, instrumental knowledge, or even a theory of mind. The other, often more ‘deflationary’ view, posits that LLMs merely model a collection of text, acting as ‘stochastic parrots’ or imperfect compressions of their training data. Klein firmly aligns with the latter, but with a nuanced, positive interpretation.

Central to Klein’s argument is the concept of the ‘format of thought’ – how information is structured and processed. Cognitive science, he notes, suggests that human linguistic capabilities rely on ‘supralinear formats’ for computation. Imagine a complex web or a tree structure where elements can be connected in multiple, non-sequential ways. This allows for the rich, hierarchical processing characteristic of human language, where meaning can depend on distant relationships within a sentence, not just adjacent words.

In contrast, Klein argues that the transformer architecture, the backbone of modern LLMs, supports at best ‘linear formats’ for processing. Think of a simple list or a sequence where elements are ordered one after another. This limitation stems from certain inherent ‘invariants’ of the transformer’s computational architecture, particularly within its ‘residual stream’ – the data pathway where all the computational action happens.

Two key invariants are discussed: ‘permutation invariance’ and ‘substring invariance’. Permutation invariance, found in unmasked transformers, means that the core operations are indifferent to the order of input tokens; positional encodings are added to explicitly provide order information. Substring invariance, characteristic of masked transformers (like those used for text generation), dictates that operations on an initial part of a sequence are unaffected by later tokens. This means that later information cannot influence how earlier parts of a sentence are processed, which is crucial for building complex, interdependent structures like syntactic trees where later words can disambiguate earlier ones.

These invariants, Klein contends, strongly suggest that transformers are limited to processing information in a linear fashion. While they can represent facts about language and its structure, this is seen as a property of the ‘content’ of their representations, not their underlying ‘format’. This implies that LLMs are not processing language in the same way humans do.

So, if LLMs don’t model human linguistic capacity, what positive story can be told about their impressive performance? Klein turns to the idea of ‘shortcut automata’, building on speculations by Liu et al. (2022). He suggests that transformers are not primarily concerned with language itself, but rather with calculating the ‘input-to-state function’ of an automaton that could have produced the training corpus. Essentially, they learn efficient ‘shortcut solutions’ to emulate the step-by-step transitions of a finite state automaton that generates text.

This perspective, rooted in the Krohn-Rhodes theorem from automata theory, explains how a linear, parallel feedforward architecture can handle autoregressive tasks. The transformer learns to predict the next state of this hypothetical ‘corpus-producing automaton’ given the current input, effectively learning the permissible transformations within the corpus. This is why LLMs excel at generating text that sounds like a continuation of existing text – they’ve learned the ‘rules’ of the ‘discourse machine’ inherent in the training data.

Klein concludes that this is far from a deflationary view. Language, he argues, is not just a means of expressing inner states, but also a powerful ‘discourse machine’ – a technology that allows us to systematically transform bits of language given appropriate context. Humans learn to use this technology in one way, often implicitly, by being exposed to vast amounts of text and learning how to produce more of the same. LLMs, through their unique architecture and training, have also learned to use this discourse machine, but via very different, linear means.

Also Read:

The paper ultimately suggests that LLMs offer profound insights not into human linguistic production, but into the inherent power and structure of the corpus itself. They demonstrate that an ability to learn transformations within a linear framework is sufficient to capture the essence of a language corpus and generate more like it, highlighting the corpus as a ‘discourse machine’ that shapes both human and artificial language generation.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -