Unpacking How Large Language Models Interpret External Definitions

TLDR: A new study investigates whether Large Language Models (LLMs) truly incorporate external label definitions or primarily rely on their pre-trained knowledge. Through controlled experiments with various LLMs and definition types across general and domain-specific tasks, the research reveals that while explicit definitions can enhance accuracy and explainability, their integration is not always guaranteed or consistent. Models often default to internal representations, especially in general tasks, but benefit more from explicit definitions in specialized domains. The study also highlights a disconnect between improved explanation quality and classification accuracy, suggesting distinct internal processes.

Large Language Models (LLMs) have become incredibly powerful, but a fundamental question remains: do they truly understand and incorporate external instructions, like label definitions, or do they mostly rely on their vast pre-existing knowledge? A recent research paper, “Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions”, dives deep into this question, revealing fascinating insights into how these AI models process information.

The researchers, Seyedali Mohammadi, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru, Francis Ferraro, and Manas Gaur, conducted a series of controlled experiments to understand this interplay. They tested various LLMs, including GPT-4, LLaMA-3, Phi-3, and Mistral, across different types of tasks and definition conditions. These conditions ranged from expert-curated definitions to those generated by LLMs themselves, and even intentionally perturbed or swapped definitions to see how models would react.

How LLMs Handle Conflicting Definitions

One key area of investigation was how LLMs respond when definitions are intentionally misaligned or incorrect. The study found that models generally perform much better when label definitions are correctly aligned with their intended meaning. When definitions were swapped or corrupted, performance dropped significantly. This suggests that while LLMs have internal knowledge, they are indeed receptive to the explicit instructions provided in the prompt.

Interestingly, the sensitivity to definition quality varied greatly. For instance, LLaMA-3 showed a remarkable increase in performance when moving from incorrect to correct definitions in general language tasks. However, a surprising finding involved GPT-4, which, when faced with highly inconsistent definitions, sometimes chose to abstain from providing a prediction altogether. This “meta-response” suggests a sophisticated ability to detect contradictions, a capability not observed in the other models.

The research also highlighted a difference between general and domain-specific tasks. While general tasks like natural language inference sometimes saw models default to their internal representations, domain-specific tasks, such as mental health categorization or hate speech detection, often benefited more significantly from precise, explicit definitions. This implies that for specialized areas where LLMs might have less pre-training exposure, external definitions become even more crucial.

Strategies for Integrating Definitions

Beyond just the quality of definitions, the way they are presented to the model also matters. The study explored four integration strategies: a “vanilla” setting with no explicit definitions, “fixed definitions” (expert-written), “adjusted definitions” (dynamically generated by an LLM for each input), and a combination of “fixed definitions + few-shot examples.”

Counterintuitively, for general tasks like e-SNLI, models sometimes performed best in the definition-free “vanilla” setting. This suggests that for tasks where LLMs have very robust internal representations, explicit definitions can sometimes interfere. However, for domain-specific tasks, definitions generally improved performance, with some models showing dramatic gains. Mistral, for example, showed a tenfold increase in performance for hate speech detection when provided with definitions.

A particularly intriguing discovery was the “explanation-classification disconnect.” The researchers found that while definitions often dramatically improved the quality of the explanations generated by LLMs, this improvement didn’t always translate into higher classification accuracy. This suggests that LLMs might have partially distinct systems for reasoning about concepts (which definitions help) and for making categorical predictions.

Also Read:

Practical Takeaways

The findings offer valuable guidance for anyone working with LLMs. For specialized applications, especially with smaller models like Phi-3 or Mistral, carefully crafted and context-specific definitions can significantly boost performance. In cloud environments, larger models like GPT-4, while relying more on internal knowledge, can offer a safety net by refusing to answer when definitions conflict, which is critical for high-stakes scenarios. Even when definitions don’t directly improve classification, they reliably enhance explanation quality, fostering user trust and transparency.

This research underscores that LLMs’ receptivity to external knowledge is not uniform; it varies based on the model’s architecture, the task domain, and the quality and integration strategy of the definitions. Understanding these nuances is key to unlocking the full potential of LLMs in diverse real-world applications.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking How Large Language Models Interpret External Definitions

How LLMs Handle Conflicting Definitions

Strategies for Integrating Definitions

Practical Takeaways

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates