Unveiling Cognitive Parallels: How Large Language Models Align with Human Brains in Abstract Reasoning

TLDR: A study found that large language models (LLMs), especially those with ~70 billion parameters, not only achieve human-level accuracy in abstract pattern completion but also show internal representations and difficulty profiles similar to humans. Critically, their “task-optimal” internal layers correlate with human brain activity (FRPs), suggesting a shared way of processing abstract patterns between AI and human cognition.

A recent study delves into the fascinating question of whether large language models, or LLMs, process abstract reasoning in a way that mirrors human brain activity. This research, conducted by a team from the University of Amsterdam, compared the performance and internal representations of several open-source LLMs with human participants on a complex abstract-pattern-completion task. The findings offer compelling preliminary evidence that there might be shared principles between biological and artificial intelligence when it comes to abstract thought.

The Study’s Approach

The researchers designed an abstract-pattern-completion task where human participants solved sequences of icons, while LLMs were given text-based versions of the same patterns. To understand human brain activity, electroencephalography (EEG) was used, specifically focusing on fixation-related potentials (FRPs), which capture brain signals linked to where a person is looking. This method provides a more natural window into cognitive processes compared to traditional stimulus-locked or response-locked brain activity measurements.

Eight open-source LLMs were tested, ranging in size from 2 billion to 72 billion parameters. The study aimed to see not only if LLMs could achieve human-like accuracy but also if their internal computational processes, specifically the representations formed within their hidden layers, aligned with human brain activity during the task.

Key Findings: Performance and Internal Alignment

The study revealed that only the largest LLMs, those with approximately 70 billion parameters, achieved accuracy comparable to human participants. Notably, Qwen-2.5-72B and DeepSeek-R1-Distill-Llama-70B stood out, not just for their high accuracy (around 75-80%) but also for mirroring the human pattern-specific difficulty profile. This means they struggled with the same types of patterns that humans found challenging.

A crucial discovery was how LLMs organize abstract pattern categories internally. The researchers found that every LLM tested formed distinct clusters for these categories within their intermediate layers. The stronger this clustering, the better the model performed on the task. This suggests that the way LLMs represent abstract patterns internally is directly linked to their ability to solve these problems.

Perhaps the most intriguing finding was the moderate positive correlations observed between the representational geometries of the LLMs’ “task-optimal” layers (where pattern categories were most clearly separated) and human frontal FRPs. This alignment was specific to FRPs and consistently diverged from comparisons with other EEG measures, such as response-locked ERPs or resting EEG. This hints at a potential shared representational space for abstract patterns between advanced LLMs and the human brain.

Also Read:

Implications and Future Directions

The study suggests that LLMs might indeed mirror human brain mechanisms in abstract reasoning, offering a glimpse into shared principles between biological and artificial intelligence. The researchers also noted an interesting trade-off: DeepSeek-R1-Distill-Llama-70B, a reasoning-optimized variant of Llama-3.3-70B, showed greater human-likeness in its error patterns despite a slight reduction in overall accuracy. This indicates that encouraging step-by-step reasoning in LLMs could lead to more human-like cognitive processes.

While promising, the study acknowledges several limitations, including the relatively small human participant sample size, the difference in task modality (visual for humans, text for LLMs), and the inherent signal-to-noise challenges of EEG. Future research could explore larger datasets, multimodal tasks, and more advanced signal processing techniques to further solidify these findings. The full research paper can be accessed here: Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Cognitive Parallels: How Large Language Models Align with Human Brains in Abstract Reasoning

The Study’s Approach

Key Findings: Performance and Internal Alignment

Implications and Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates