TLDR: This research paper explores linguistic errors made by native Spanish speakers through an interdisciplinary lens, combining theoretical linguistics, neurolinguistics, and natural language processing. It analyzes how current large language models (LLMs) interpret, reproduce, or correct these errors. The study found that while LLMs are highly accurate with common grammatical errors, they struggle with more subtle spelling, lexical, syntactic, and semantic mistakes, highlighting the limitations of AI’s lack of embodied cognition compared to human language processing. The paper emphasizes the need for specialized, linguistically informed approaches to improve NLP systems for complex languages like Spanish.
A recent interdisciplinary study delves into the fascinating world of linguistic errors made by native Spanish speakers, exploring what these mistakes reveal about the human mind and the current capabilities and limitations of artificial intelligence. The research, titled “Imperfect Language, Artificial Intelligence, and the Human Mind: An Interdisciplinary Approach to Linguistic Errors in Native Spanish Speakers”, was conducted by Francisco Portillo López from the University of Navarra.
Linguistic errors are more than just grammatical slip-ups; they offer a unique window into how our brains process language and highlight the challenges faced by artificial intelligence systems trying to replicate human communication. This project brought together theoretical linguistics to categorize errors, neurolinguistics to understand their brain processing, and natural language processing (NLP) to evaluate how large language models (LLMs) handle them.
The study built a unique collection of over 500 authentic linguistic errors from native Spanish speakers, gathered from real interactions on social media platforms like Reddit, X, and Telegram. These errors were then tested against leading AI models such as GPT-4, GPT-5, Llama, Grok, Gemini, and DeepSeek to assess their accuracy in interpreting, reproducing, or correcting these human-like mistakes.
Understanding Linguistic Errors
The research classified errors into several types: spelling, lexical (wrong word choice), semantic (incorrect meaning), and syntactic (sentence structure). It also focused on common Spanish-specific errors like the incorrect pluralization of the verb ‘haber’ (to have/there is) and phenomena such as ‘dequeÃsmo’, ‘laÃsmo’, and ‘leÃsmo’ (incorrect use of certain pronouns and prepositions).
From a neurolinguistic perspective, the paper explains that errors are not random. They are systematic and provide insights into the brain’s language processing. For instance, speech monitoring systems allow us to self-correct, and different types of errors emerge at distinct stages of word production. Studies on aphasia, a language disorder caused by brain injury, further demonstrate how specific brain regions are involved in different aspects of language, with damage leading to predictable error patterns.
How AI Models Performed
The LLMs were given a specific prompt: “Please review these sentences and tell me if it contains any errors. Be honest and detailed: identify any spelling, grammar, syntax, or style errors, correct them, and explain why they were incorrect. If everything is correct, indicate that as well.”
The results showed a clear difference in the models’ abilities. All evaluated LLMs achieved 100% accuracy in detecting and correcting well-known grammatical errors like the use of ‘haber’ and ‘dequeÃsmo’. This suggests that these models, trained on vast amounts of text, have effectively internalized common normative patterns.
However, their performance significantly declined when faced with more subtle or context-dependent errors. For spelling errors, GPT-5 was the best, detecting 66.7% of errors, while Gemini was last with 53.5%. In syntactic errors, GPT-5 again led with 92.7% accuracy, followed by DeepSeek at 83.4%. For lexical and semantic errors, which require a deeper understanding of context and meaning, Llama-3 performed best with 86.9% accuracy, followed by DeepSeek (79.7%) and GPT-5 (76.8%).
Overall, GPT-5 and DeepSeek emerged as the most robust models, with average scores of 87.45% and 86.15% respectively. Llama-3 and Grok followed closely, while GPT-4 and Gemini showed areas for improvement, particularly in complex error correction.
Human vs. Artificial Processing
The study highlights both convergences and divergences between human and AI language processing. A theoretical framework called “Stochastic Noetic” suggests that both the human brain and LLMs generate language through sophisticated probabilistic mechanisms. This means that much of human language production, like AI’s, involves an unconscious probabilistic selection of words.
However, a major divergence is AI’s “bodilessness.” Human cognition is deeply connected to lived experience, common sense, and world knowledge, which LLMs lack. As mere data-processing algorithms, AI models struggle with social and pragmatic reasoning, failing to grasp social nuances in dynamic situations as effectively as humans. Their creativity, while impressive, is also a function of their training data and may lack genuine originality.
Also Read:
- Large Language Models: Tools for a More Integrated Cognitive Science
- Unveiling the ‘Automatic Minds’: How Hypnosis Illuminates the Inner Workings of AI
Implications for Spanish NLP
The findings have significant implications for improving NLP systems for Spanish. The study suggests that simply increasing model size or generic data volume won’t be enough. Future advances require specialization and a deep integration of linguistic knowledge. This includes developing high-quality synthetic data for complex morphosyntactic and rare lexical errors, and adopting advanced evaluation frameworks tailored for Spanish.
Ultimately, while LLMs have made incredible strides in language processing, especially with common grammatical errors, they still have much to learn from the imperfect, variable, and deeply contextual nature of real human language, particularly in languages like Spanish with rich morphological complexity.


