TLDR: A research paper by Professor Jingde Cheng argues that Large Language Models (LLMs) cannot achieve true correct reasoning due to fundamental limitations in their working principles. The paper defines true correct reasoning as a process where premises provide conclusive relevant evidence for new conclusions, and introduces Strong Relevant Logics (SRLs) as the only logical framework capable of supporting such reasoning. It explains that LLMs only simulate reasoning by statistically processing text, lacking the ability to embed a correctness evaluation criterion or guarantee 100% accurate results, leading to an ‘illusion’ of reasoning rather than genuine logical capability.
In the rapidly evolving world of Artificial Intelligence, Large Language Models (LLMs) like ChatGPT have sparked widespread discussion about their capabilities, particularly concerning their ‘understanding’ and ‘reasoning’ abilities. However, a recent research paper by Professor Jingde Cheng challenges these popular notions, arguing that LLMs, due to their fundamental working principles, can never achieve true correct reasoning.
Defining True Correct Reasoning
Professor Cheng begins by clarifying what constitutes ‘true correct reasoning’. Unlike traditional definitions, his framework emphasizes that reasoning is an ordered process of drawing new conclusions from given premises, where these premises must provide ‘conclusive relevant evidence’ for the conclusion. This definition highlights two crucial aspects: the relevance between premises and conclusion, and the constructive, functional nature of reasoning. A reasoning process is only considered ‘correct’ if its premises genuinely offer conclusive relevant evidence for the new conclusion. The paper identifies three types of arguments: deductive, inductive, and abductive, all of which rely on the notion of a ‘conditional’ (if…then…). Crucially, the correctness of an argument is about the strength and relevance of the connection between premises and conclusion, not merely whether the premises or conclusion are individually true.
The Logical Foundation for Correct Reasoning
Logic, as a discipline, aims to distinguish correct reasoning from incorrect reasoning. The paper asserts that the most essential concept in logic is the ‘logical consequence relation’, which determines which conclusions validly follow from premises. To underpin true correct reasoning, a fundamental logic system must satisfy three essential requirements:
- It must underlie correct and truth-preserving reasoning, ensuring relevance between premises and conclusion.
- It must enable ‘ampliative reasoning’, meaning it can genuinely draw new conclusions that extend existing knowledge, rather than being circular or tautological.
- It must support ‘paracomplete’ and ‘paraconsistent’ reasoning, allowing for reasoning with incomplete or inconsistent knowledge without leading to arbitrary conclusions (rejecting the principle that ‘everything follows from a contradiction’).
The paper critically examines existing logic systems. Classical Mathematical Logic (CML) falls short because it doesn’t account for relevance, uses a problematic ‘material implication’ for conditionals, is not ampliative, and cannot handle inconsistency. Traditional Relevant Logics (RLs) were developed to address the relevance issue and avoid paradoxes, but they still suffer from certain ‘implicational paradoxes’ because their relevance principle isn’t strong enough.
Introducing Strong Relevant Logics (SRLs)
To overcome these limitations, Professor Cheng proposes Strong Relevant Logics (SRLs), specifically Rc, Ec, and Tc. These logics enforce a ‘strong relevance principle’ where every propositional variable in a theorem must occur at least once as an antecedent part and once as a consequent part. This ensures that premises contain no unnecessary conjuncts and conclusions no unnecessary disjuncts, eliminating the remaining paradoxes. The paper states that SRLs are currently the *only* family of logics that can satisfy all three essential requirements for underlying true correct reasoning. Therefore, any system providing reasoning facilities, especially in crucial applications, should ideally be based on SRLs to guarantee 100% logically correct results.
The Illusion of LLM Reasoning
Despite the widespread claims of LLMs possessing reasoning abilities, the paper argues that these are merely illusions. LLMs are fundamentally ‘generative mathematical models of the statistical distribution of tokens’ in vast text corpora. While comprehensive overviews of LLMs may tout their ’emergent abilities’ like reasoning, they conspicuously omit any discussion of the *correctness* of this ‘reasoning’.
Professor Cheng attributes this illusion to several factors:
- Many people, including AI experts, may not fully grasp what ‘true correct reasoning’ entails, thus overlooking the crucial aspect of correctness.
- LLMs are trained on massive datasets containing examples of human reasoning, leading them to sometimes ‘copy’ good reasoning patterns, which is then mistaken for genuine reasoning ability.
- The ‘ELIZA effect’ plays a role, where users project human traits onto systems with textual interfaces, believing the LLM is truly reasoning during communication.
- LLMs are powerful enough to simulate reasoning effectively, solving problems that might challenge some humans, further reinforcing the belief in their reasoning prowess.
Also Read:
- Large Language Models Know Clinical Facts, But Struggle to Reason with Them
- Stress-Testing LLMs: New Benchmark Reveals Fragility in Mathematical Reasoning
In-Principle Limitations of LLMs
The core argument against LLMs achieving true correct reasoning lies in their inherent working principles. True correct reasoning, as defined, demands 100% logical correctness. This is an ‘insurmountable obstacle’ for LLMs, which operate based on probability theory, statistics, and deep learning. In principle, no LLM can guarantee 100% correct results.
The paper highlights that LLMs cannot embed a ‘correctness evaluation criterion’ or a ‘dynamic evaluation mechanism’. Because they function probabilistically and generate text token by token, a formal logic system cannot be built-in as an intrinsic validity criterion. Furthermore, a global dynamic evaluation mechanism is impossible within their incremental generation style. Without a fundamental logic system, any evaluation of correct reasoning is impossible.
Ultimately, the ‘truth’ or ‘correctness’ within LLMs is merely ‘statistical plausibility in text’, lacking correspondence to real-world truth. Therefore, LLMs can only simulate the *form* of reasoning but fundamentally lack the internal mechanisms for evaluating and ensuring its correctness. The paper concludes that seeking true reasoning ability in LLMs without considering a correctness evaluation criterion is a ‘completely wrong and hopeless research direction’. For more details, you can read the full paper here.


