Unpacking LLM Intelligence: How Knowledge and Reasoning Work Together (and Apart)

TLDR: This research introduces a framework to separate knowledge and reasoning in LLMs, inspired by human dual-system thinking. It finds that reasoning benefits problem-solving in math/science but can hurt knowledge-heavy domains. Larger LLMs show more significant knowledge gains and become more “prudent” in reasoning. Knowledge is primarily in lower network layers, while reasoning is in higher layers.

Large Language Models (LLMs) are incredibly powerful, but understanding how they arrive at their answers can be a bit of a mystery. This new research delves into how LLMs use two distinct mental processes: knowledge and reasoning. Think of it like how humans think – sometimes we react quickly based on what we know (fast thinking), and other times we deliberate and adjust our thoughts (slow thinking).

Inspired by this human “dual-system cognitive theory,” researchers from Tsinghua University, Mutian Yang, Jiandong Gao, and Ji Wu, have developed a framework to separate these two contributions in LLMs. They propose that LLM cognition can be broken down into two phases: “knowledge retrieval” (Phase 1) and “reasoning adjustment” (Phase 2).

To test this, LLMs were prompted to generate answers under two different “cognitive modes”: fast thinking and slow thinking. In fast thinking, the LLM gives an immediate answer based purely on its stored knowledge. In slow thinking, the LLM first generates an initial answer (like fast thinking) and then refines it through a process similar to Chain-of-Thought (CoT) reasoning. By comparing the performance in these two modes, the researchers could quantify the contribution of knowledge and reasoning.

The study involved 15 different LLMs across three datasets, including MMLU, MathQA, and MedQA. The findings offer some fascinating insights into how these models work. One key discovery is that reasoning adjustment isn’t equally beneficial across all subjects. It significantly helps in “reasoning-intensive” domains like mathematics, physics, and chemistry, where problems often require step-by-step logical deduction. However, in “knowledge-intensive” domains such as political science or history, reasoning adjustment can sometimes even hinder performance. This suggests that if an LLM lacks the fundamental knowledge, extra reasoning might just introduce noise rather than provide a correct answer.

Another important finding relates to how LLMs improve with size. As models get larger (parameter scaling), both their knowledge retrieval and reasoning adjustment capabilities improve. However, the boost in knowledge is more significant and sustained. Interestingly, larger models also become “more prudent” in their reasoning, meaning they are less prone to “overthinking” and making mistakes when they were initially correct. This prudence is a major factor in the reasoning gains observed in medium-sized models.

The research also sheds light on where these cognitive processes reside within the LLM’s neural network. It was found that knowledge primarily sits in the “lower network layers,” while reasoning operations occur in the “higher layers.” This suggests a functional separation, where the initial layers handle the recall of information, and the later layers are responsible for processing and refining that information through reasoning.

This “cognition attribution framework” not only helps us understand LLMs from a new “decoupling” perspective but also provides valuable insights into existing research areas like scaling laws and how knowledge is stored and edited within these models. For more technical details, you can refer to the full research paper available at arXiv.

Also Read:

In conclusion, this study offers a clearer picture of the intricate interplay between knowledge and reasoning in LLMs, highlighting their distinct roles and how they evolve with model scale and across different domains. It’s a significant step towards building more interpretable and effective AI systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking LLM Intelligence: How Knowledge and Reasoning Work Together (and Apart)

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

A New Way to Disentangle Data for Scientific Exploration

SiegPath Honored with ‘Most Innovative Fintech Award’ at AI Expo Europe 2025 for AI-Driven Solutions

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates