Unpacking Metaphorical Minds: How LLMs Grapple with Figurative Language

TLDR: A study by Fengying Ye et al. investigates Large Language Models’ (LLMs) understanding of metaphors across three areas: conceptual irrelevance, context leveraging, and syntactic influence. Findings show LLMs generate 15-25% conceptually irrelevant interpretations, rely heavily on inherent word associations rather than context, and are more sensitive to syntactic irregularities than true structural comprehension. The research highlights limitations in LLMs’ deep metaphorical understanding, calling for improved conceptual alignment and contextual reasoning.

Large Language Models (LLMs) have shown incredible abilities in understanding and generating human language. However, a recent study delves into a particularly complex area of human communication: metaphors. Metaphors, like “fall in love” or “the computer is a turtle,” are deeply embedded in our language, allowing us to express abstract ideas through more tangible concepts. This research, titled “Unveiling LLMs’ Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence,” investigates how well LLMs truly grasp these nuanced expressions. You can read the full paper here: Unveiling LLMs’ Metaphorical Understanding.

The Challenge of Metaphors for LLMs

Metaphorical understanding is not just about recognizing words; it’s about mapping concepts from one domain to another. For instance, “fall in love” maps the physical act of “falling” to the emotional state of “entering an emotional state.” Traditional linguistic theories, like Conceptual Metaphor Theory (CMT), explain this as cross-domain mapping. While LLMs excel at many language tasks, their grasp of metaphors has shown limitations. Previous studies have pointed out “trigger word” errors, where LLMs misinterpret metaphors by focusing on individual words rather than the broader context, leading to incorrect conceptual mappings.

Three Key Areas of Investigation

The researchers, Fengying Ye, Shanshan Wang, Lidia S. Chao, and Derek F. Wong from the NLP2CT Lab at the University of Macau, explored LLMs’ metaphor processing from three crucial angles:

1. Conceptual Irrelevance: This examines whether LLMs generate interpretations that are conceptually irrelevant to the actual meaning of the metaphor. For example, interpreting “fall in love” as a physical drop rather than an emotional state. The study used a novel “spatial analysis” framework, mapping LLM-generated interpretations into a high-dimensional embedding space to quantify these irrelevant errors.

2. Context Leveraging: This investigates if LLMs truly use context to understand metaphors or if they rely on a “metaphor-literal repository” – a kind of inherent association between metaphorical words and their literal counterparts, regardless of the surrounding text. To test this, LLMs were asked to generate literal or metaphorical words both with and without contextual sentences.

3. Syntactic Influence: Metaphors often have specific grammatical structures. This part of the study assessed how disrupting sentence structures (randomly shuffling words, changing parts of speech, or repositioning metaphorical words) affects LLMs’ ability to detect metaphors. This helps determine if LLMs rely on syntactic patterns for metaphor analysis.

What the Study Found

The findings revealed several key insights into LLMs’ limitations:

Conceptually Irrelevant Interpretations: LLMs generated interpretations that were 15%-25% conceptually irrelevant. Even the best-performing models struggled to achieve a deep conceptual understanding, often sticking to superficial lexical mappings.
Limited Contextual Understanding: The “metaphorical imagination” experiments showed a high overlap (65%-80%) between contextualized and de-contextualized outputs. This suggests that LLMs often rely on inherent word associations (the “metaphor-literal repository”) rather than actively leveraging context for deeper metaphor analysis. They tend to connect metaphorical words with commonly co-occurring literal expressions, even when context might suggest otherwise.
Sensitivity to Syntactic Irregularities: LLMs were found to be more sensitive to syntactic irregularities than to a true structural comprehension of metaphors. Interestingly, some models performed better when sentences were syntactically disrupted in specific ways (like Part-of-Speech shuffle) compared to original sentences. This indicates that LLMs might treat irregular language usage as an indicator of metaphor, aligning with theories like Selection Preference Violation (SPV), rather than understanding the underlying grammatical structure.

Also Read:

Implications for Future LLM Development

The research highlights that while LLMs possess some surface-level competence in handling metaphors, their understanding remains inconsistent and often lacks true conceptual depth. The authors emphasize the need for more robust computational approaches that can improve conceptual alignment, contextual reasoning, and syntactic integration in LLMs. Future work could explore the impact of fine-tuning and few-shot prompting on these capabilities, and further investigate how LLMs align with human conceptual mapping processes for metaphors.

This study provides valuable insights into the current limitations of LLMs in one of the most intricate aspects of human language, paving the way for more sophisticated and human-like AI language comprehension.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking Metaphorical Minds: How LLMs Grapple with Figurative Language

The Challenge of Metaphors for LLMs

Three Key Areas of Investigation

What the Study Found

Implications for Future LLM Development

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates