TLDR: A study found that advanced Large Language Models (LLMs) like GPT-5 and the o3 family can perform complex mental imagery tasks, traditionally thought to require visual imagination, significantly better than humans. This suggests LLMs use propositional reasoning, challenging long-held beliefs about mental imagery and offering new benchmarks for AI cognitive abilities. Surprisingly, image-aided reasoning did not improve LLM performance.
A groundbreaking study has revealed that advanced Large Language Models (LLMs) are capable of performing complex mental imagery tasks, traditionally believed to require visual imagination, at a level significantly surpassing human performance. This finding challenges long-standing theories in cognitive psychology and opens new avenues for understanding artificial intelligence’s emergent cognitive capacities.
For decades, cognitive psychologists have debated the nature of mental imagery, with the dominant view suggesting it’s a pictorial process—meaning we ‘see’ images in our minds. A classic task used to support this view involves following a series of instructions to transform imagined letters and shapes into a final object, which subjects then identify. Success in this task was thought to be impossible without visual mental imagery.
However, the new research, titled ARTIFICIALPHANTASIA: EVIDENCE FOR PROPOSITIONAL REASONING-BASED MENTAL IMAGERY IN LARGE LANGUAGE MODELS, put this hypothesis to the test using state-of-the-art LLMs. Given that LLMs are primarily text-based and lack a ‘visual’ system in the human sense, they are ideal candidates to explore whether language alone could be sufficient for such tasks.
The researchers, Morgan McCarty and Jorge Morales from Northeastern University, designed 60 novel instruction sets, alongside 12 from the original Finke et al. study, for an object reconstruction task. They then tested several leading LLMs, including models from OpenAI (o3 family, GPT-5), Claude, and Gemini, with these text-only instructions. To establish a baseline, 100 human subjects also completed the same task.
The results were striking: the best LLMs, specifically GPT-5 and the o3 family of models, performed significantly above the average human performance, showing a 9.4% to 12.2% increase over the human average of 54.7%. This suggests that these AI models can effectively ‘imagine’ and manipulate objects based purely on linguistic descriptions.
Interestingly, the study also explored an ‘image-aided’ approach where LLMs were prompted to generate and modify images at each step of the task. Contrary to expectations, this did not improve performance; in fact, it either decreased it or kept it at the same level. This indicates that the LLMs’ success was not due to a simulated visual process but rather an underlying propositional reasoning capability.
The findings also resonate with observations in humans with aphantasia—a condition where individuals lack conscious visual mental imagery. Despite this, aphantasics can often perform mental imagery tasks surprisingly well, frequently reporting the use of verbal strategies. The LLMs’ success provides further evidence that non-imagistic, propositional reasoning might be sufficient for tasks long thought to be imagery-dependent, reigniting a significant debate in cognitive science about the fundamental nature of mental representations.
Also Read:
- Why Advanced AI Models Struggle with Simple Visual Tasks: The Serial Processing Gap
- Decoding LLM’s Visual Intuition from Language Pre-training
This research not only demonstrates an emergent cognitive capacity in LLMs but also provides the field with a new, robust benchmark for evaluating sophisticated cognitive behaviors in artificial systems. It suggests that the most advanced LLMs may be capable of extracting and manipulating spatial relations from textual information, offering a fresh perspective on how both human and artificial minds process complex information.


