TLDR: A study found that large language models (LLMs), especially those with ~70 billion parameters, not only achieve human-level accuracy in abstract pattern completion but also show internal representations and difficulty profiles similar to humans. Critically, their “task-optimal” internal layers correlate with human brain activity (FRPs), suggesting a shared way of processing abstract patterns between AI and human cognition.
A recent study delves into the fascinating question of whether large language models, or LLMs, process abstract reasoning in a way that mirrors human brain activity. This research, conducted by a team from the University of Amsterdam, compared the performance and internal representations of several open-source LLMs with human participants on a complex abstract-pattern-completion task. The findings offer compelling preliminary evidence that there might be shared principles between biological and artificial intelligence when it comes to abstract thought.
The Study’s Approach
The researchers designed an abstract-pattern-completion task where human participants solved sequences of icons, while LLMs were given text-based versions of the same patterns. To understand human brain activity, electroencephalography (EEG) was used, specifically focusing on fixation-related potentials (FRPs), which capture brain signals linked to where a person is looking. This method provides a more natural window into cognitive processes compared to traditional stimulus-locked or response-locked brain activity measurements.
Eight open-source LLMs were tested, ranging in size from 2 billion to 72 billion parameters. The study aimed to see not only if LLMs could achieve human-like accuracy but also if their internal computational processes, specifically the representations formed within their hidden layers, aligned with human brain activity during the task.
Key Findings: Performance and Internal Alignment
The study revealed that only the largest LLMs, those with approximately 70 billion parameters, achieved accuracy comparable to human participants. Notably, Qwen-2.5-72B and DeepSeek-R1-Distill-Llama-70B stood out, not just for their high accuracy (around 75-80%) but also for mirroring the human pattern-specific difficulty profile. This means they struggled with the same types of patterns that humans found challenging.
A crucial discovery was how LLMs organize abstract pattern categories internally. The researchers found that every LLM tested formed distinct clusters for these categories within their intermediate layers. The stronger this clustering, the better the model performed on the task. This suggests that the way LLMs represent abstract patterns internally is directly linked to their ability to solve these problems.
Perhaps the most intriguing finding was the moderate positive correlations observed between the representational geometries of the LLMs’ “task-optimal” layers (where pattern categories were most clearly separated) and human frontal FRPs. This alignment was specific to FRPs and consistently diverged from comparisons with other EEG measures, such as response-locked ERPs or resting EEG. This hints at a potential shared representational space for abstract patterns between advanced LLMs and the human brain.
Also Read:
- Unpacking AI’s Mind: New Study Questions How We Identify Key Processing Units for Social and Math Tasks
- Large Language Models Know Clinical Facts, But Struggle to Reason with Them
Implications and Future Directions
The study suggests that LLMs might indeed mirror human brain mechanisms in abstract reasoning, offering a glimpse into shared principles between biological and artificial intelligence. The researchers also noted an interesting trade-off: DeepSeek-R1-Distill-Llama-70B, a reasoning-optimized variant of Llama-3.3-70B, showed greater human-likeness in its error patterns despite a slight reduction in overall accuracy. This indicates that encouraging step-by-step reasoning in LLMs could lead to more human-like cognitive processes.
While promising, the study acknowledges several limitations, including the relatively small human participant sample size, the difference in task modality (visual for humans, text for LLMs), and the inherent signal-to-noise challenges of EEG. Future research could explore larger datasets, multimodal tasks, and more advanced signal processing techniques to further solidify these findings. The full research paper can be accessed here: Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning.


