spot_img
HomeResearch & DevelopmentUnveiling Cognitive Parallels: How Large Language Models Align with...

Unveiling Cognitive Parallels: How Large Language Models Align with Human Brains in Abstract Reasoning

TLDR: A study found that large language models (LLMs), especially those with ~70 billion parameters, not only achieve human-level accuracy in abstract pattern completion but also show internal representations and difficulty profiles similar to humans. Critically, their “task-optimal” internal layers correlate with human brain activity (FRPs), suggesting a shared way of processing abstract patterns between AI and human cognition.

A recent study delves into the fascinating question of whether large language models, or LLMs, process abstract reasoning in a way that mirrors human brain activity. This research, conducted by a team from the University of Amsterdam, compared the performance and internal representations of several open-source LLMs with human participants on a complex abstract-pattern-completion task. The findings offer compelling preliminary evidence that there might be shared principles between biological and artificial intelligence when it comes to abstract thought.

The Study’s Approach

The researchers designed an abstract-pattern-completion task where human participants solved sequences of icons, while LLMs were given text-based versions of the same patterns. To understand human brain activity, electroencephalography (EEG) was used, specifically focusing on fixation-related potentials (FRPs), which capture brain signals linked to where a person is looking. This method provides a more natural window into cognitive processes compared to traditional stimulus-locked or response-locked brain activity measurements.

Eight open-source LLMs were tested, ranging in size from 2 billion to 72 billion parameters. The study aimed to see not only if LLMs could achieve human-like accuracy but also if their internal computational processes, specifically the representations formed within their hidden layers, aligned with human brain activity during the task.

Key Findings: Performance and Internal Alignment

The study revealed that only the largest LLMs, those with approximately 70 billion parameters, achieved accuracy comparable to human participants. Notably, Qwen-2.5-72B and DeepSeek-R1-Distill-Llama-70B stood out, not just for their high accuracy (around 75-80%) but also for mirroring the human pattern-specific difficulty profile. This means they struggled with the same types of patterns that humans found challenging.

A crucial discovery was how LLMs organize abstract pattern categories internally. The researchers found that every LLM tested formed distinct clusters for these categories within their intermediate layers. The stronger this clustering, the better the model performed on the task. This suggests that the way LLMs represent abstract patterns internally is directly linked to their ability to solve these problems.

Perhaps the most intriguing finding was the moderate positive correlations observed between the representational geometries of the LLMs’ “task-optimal” layers (where pattern categories were most clearly separated) and human frontal FRPs. This alignment was specific to FRPs and consistently diverged from comparisons with other EEG measures, such as response-locked ERPs or resting EEG. This hints at a potential shared representational space for abstract patterns between advanced LLMs and the human brain.

Also Read:

Implications and Future Directions

The study suggests that LLMs might indeed mirror human brain mechanisms in abstract reasoning, offering a glimpse into shared principles between biological and artificial intelligence. The researchers also noted an interesting trade-off: DeepSeek-R1-Distill-Llama-70B, a reasoning-optimized variant of Llama-3.3-70B, showed greater human-likeness in its error patterns despite a slight reduction in overall accuracy. This indicates that encouraging step-by-step reasoning in LLMs could lead to more human-like cognitive processes.

While promising, the study acknowledges several limitations, including the relatively small human participant sample size, the difference in task modality (visual for humans, text for LLMs), and the inherent signal-to-noise challenges of EEG. Future research could explore larger datasets, multimodal tasks, and more advanced signal processing techniques to further solidify these findings. The full research paper can be accessed here: Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -