TLDR: A research paper argues that the scaling laws governing large language models (LLMs) inherently limit their ability to improve prediction accuracy, making it intractable to meet scientific standards. The authors suggest that the very mechanism enabling LLM learning (generating non-Gaussian outputs) contributes to error accumulation and information catastrophes. They highlight that current LLM growth leads to diminishing returns, unsustainable resource demands, and a “degenerative AI” pathway unless a greater emphasis is placed on insight and understanding over brute-force scaling.
Large language models (LLMs) have transformed science and society, showcasing remarkable abilities in natural language processing. However, a recent research paper titled “The wall confronting large language models” by P.V. Coveney and S. Succi delves into fundamental limitations that could hinder their future progress. The paper argues that the very mechanisms driving LLM learning might also be the source of their inherent inaccuracies and unsustainable growth.
The Challenge of Scaling
LLMs, like those from major tech companies, require immense computational resources and energy. They are built with trillions of parameters, yet the paper highlights that the improvements gained from increasing their size are becoming increasingly limited. For instance, GPT-4.5, despite its speculated 5-10 trillion parameters and significantly higher cost than its predecessor, shows only qualitative gains in subjective areas like writing, with little substantial improvement in verifiable domains such as mathematics or science. Meta’s Llama 4 Behemoth also appears to underperform relative to its massive scale. This suggests a problem with the “scaling laws” that govern LLM performance.
A Different Kind of Error
Unlike traditional computer simulations, where errors scale predictably with computational resources, LLMs operate in ultra-high-dimensional spaces using a mix of deterministic and stochastic techniques. The paper points out that LLMs exhibit very low scaling exponents, meaning that a massive increase in resources yields only marginal improvements in accuracy. For example, to reduce an LLM’s error by a factor of 10, it might require ten billion times more compute resources, and an astronomical 10^20 times more computing power for just one order better in accuracy when considering power consumption. This is a stark contrast to Monte Carlo simulations, which can achieve similar error reductions with far less resource increase.
The Roots of Uncertainty
The authors propose that the ability of LLMs to generate non-Gaussian output distributions from Gaussian inputs, which is crucial for their learning, might also be the cause of “error pileup” and “information catastrophes.” This phenomenon is termed “Resilience of Uncertainty” (RoU), where uncertainty decays very slowly regardless of the amount of training data. Furthermore, the paper discusses how the “loss function” used in LLM training, often seen as a measure of success, isn’t a true metric of prediction quality. Driving the loss too low can lead to overfitting or “Potemkins”—an illusion of understanding where answers are irreconcilable with human interpretation.
The Impending “Wall”
While LLMs haven’t yet hit a “wall” where increasing resources leads to worse accuracy (negative exponents), the current very small positive exponents indicate a path of strongly diminishing returns. The paper draws an analogy to roundoff errors in digital systems, where pushing precision too far can lead to noise dominating. This suggests that simply scaling up models indefinitely is not a sustainable solution for achieving the accuracy required for scientific and professional applications.
Beyond Current LLMs
The paper acknowledges that the AI industry is exploring new paradigms like Large Reasoning Models (LRMs) and Agentic AI to enhance credibility and reduce error rates. These approaches aim to mimic human-like reasoning or orchestrate multiple AI systems. However, the authors caution that if these new AIs rely on components with the same fundamental scaling issues as current LLMs, they too are unlikely to offer sustainable, scalable solutions. Instead, they suggest that perhaps generative models should embrace their “hallucination” tendency, channeling it into exploratory value rather than suppressing it.
Also Read:
- The Policy Cliff: Explaining Sudden Shifts in Large Language Model Behavior
- Unpacking the Inherent Logical Limits of Neural Networks
An Avoidable Degenerative Path
The research concludes by outlining a “Degenerative AI” (DAI) pathway, where low scaling exponents, non-Gaussian fluctuations, and resilience of uncertainty lead to an untamed accumulation of errors and information catastrophes, especially when trained on synthetic data. A critical factor contributing to this is the “deluge of spurious correlations” identified by Calude and Longo, which increase exponentially with dataset size, overwhelming true correlations. The paper emphasizes that ignoring the scientific method and relying solely on brute force scaling is a path doomed to failure. True progress, they argue, necessitates a higher premium on insight and understanding of problem characteristics. You can read the full paper here.


