Unpacking Prompting: How Language Model Instructions Affect Internal Representations

TLDR: An empirical study investigated how prompts influence the internal representations (embeddings) of language models for zero-shot classification tasks. Contrary to the initial hypothesis, the research found that while prompting does alter representations, these changes do not consistently correlate with the relevance of the prompt to the target task. Sometimes, irrelevant or random prompts improved performance, while relevant ones occasionally degraded it. The study suggests that the mechanisms of in-context learning are more complex than a simple alignment between prompt relevance and representation quality, possibly due to the models’ pre-training scale or the need for additional supervised adaptation.

Large Language Models (LLMs) have become incredibly versatile, capable of performing a wide array of tasks without needing specific training for each one. This adaptability often comes from a technique called ‘prompting,’ where we give the model instructions or examples in plain text to guide its behavior. But how exactly do these prompts work their magic? A recent empirical study, titled Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings, delves into this question by examining how prompts influence the internal ‘representations’ or ’embeddings’ that language models create for text.

The core idea behind this research, conducted by Cesar Gonzalez-Gutierrez and Dirk Hovy, was to investigate whether prompts, especially those relevant to a specific task, lead to better quality internal representations within the language model. Intuitively, one might assume that if you give a model a clear, task-specific instruction, its internal understanding of the input text for that task would improve. This study put that assumption to the test.

Understanding the Experiment

The researchers used a method called ‘probing experiments.’ Imagine you want to know if a model understands a certain concept. You can ‘probe’ its internal representations by training a simple classifier on top of these representations to see if it can perform a task. If the classifier performs well, it suggests the underlying representation encodes useful information for that task.

In this study, they applied various prompt templates to different datasets across four classification tasks: toxicity detection, sentiment analysis, topic classification, and natural language inference (NLI). The prompts included:

Task-relevant prompts (e.g., “Is this a toxic comment?: {text}”)
Irrelevant prompts (e.g., a sentiment prompt used for a toxicity task)
Random prompts (e.g., “Spiky hospital aspiring tooth scale?: {text}”)
No prompt (the original input text)

They generated embeddings using three different language models: BERT, RoBERTa, and GPT-2, which represent different pre-training approaches (masked language models and autoregressive models). They also explored different ways of creating these embeddings, such as averaging token representations or using specific tokens like `[CLS]`.

Surprising Findings on Prompting Effects

The study yielded some unexpected results that challenge common assumptions about prompting:

First, prompting undeniably alters the sentence representations. This means that adding instructions or context to an input text changes how the model internally understands and encodes that text. This is a fundamental confirmation of prompting’s influence.

However, the crucial finding was that these changes in representation quality *do not consistently correlate with the relevance of the prompts to the target task*. In simpler terms, a prompt specifically designed for a task didn’t always lead to better representations for that task. Sometimes, prompts for unrelated tasks or even completely random strings of words improved performance, while relevant prompts occasionally degraded it. The effectiveness of a prompt was highly dependent on the specific model and dataset being used.

For instance, BERT often showed statistically significant improvements with *any* prompt, including random ones, for datasets like Wiki Toxic and IMDB. RoBERTa’s behavior varied, and GPT-2 consistently showed degraded performance with prompts. For topic classification, BERT representations didn’t even show significant improvements with relevant prompts.

Deeper Dives: Ablation Studies

The researchers conducted further ‘ablation studies’ to understand these behaviors better. They looked at:

Representation Choice: Different ways of generating embeddings (e.g., using the `[CLS]` token versus averaging all tokens) introduced variability, but didn’t change the main conclusion about inconsistent correlation with prompt relevance.
Task Alignment: They used an alternative metric called ‘task alignment’ which measures how well the representation space aligns with the task space. This metric showed a strong positive correlation with probing performance, suggesting that prompts redistribute class samples in the embedding space. However, it didn’t explain *why* relevant prompts weren’t consistently better.
Prompt Structure: Experiments where only the sample tokens were used for representation (masking instructions) or where instructions were separated by a `[SEP]` token in BERT showed only slight changes. This indicated that the contextualized sample tokens are sufficient, and architectural features like `[SEP]` didn’t necessarily improve alignment between task and prompt relevance.
Static Prompts: When prompt instructions were averaged with sample embeddings *without* contextualization by the model, the effect of prompting disappeared. This confirms that prompts must influence token representations *through contextualization within the model* to be effective.

Also Read:

Implications and Future Directions

The study concludes that while prompting modifies sentence-level representations by contextualizing tokens, the mechanisms enabling zero-shot in-context learning (ICL) are still not fully understood. The initial hypothesis that relevant prompts consistently improve representations was not supported by the empirical evidence.

The authors suggest several reasons for this unexpected behavior:

The ’embedding-level’ perspective might be too limited to capture the full complexity of ICL, which might involve deeper layer dynamics.
The models used in the study (BERT, RoBERTa, GPT-2) were pre-trained on relatively smaller corpora compared to modern large-scale LLMs. Larger, more extensively pre-trained models might exhibit different behaviors.
Current LLMs often undergo additional supervised adaptation, like instruction fine-tuning or reinforcement learning from human feedback. This extra training might be crucial for achieving robust, prompt-driven performance, and the study’s findings might not generalize to such models.

This research highlights that the relationship between prompts and internal model representations is more intricate than previously assumed. It opens doors for future work to explore these mechanisms in larger, instruction-tuned models and with more dynamic analysis of internal computations.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking Prompting: How Language Model Instructions Affect Internal Representations

Understanding the Experiment

Surprising Findings on Prompting Effects

Deeper Dives: Ablation Studies

Implications and Future Directions

Gen AI News and Updates

Generative AI Powers Next-Gen Autonomous Emergency Response

STV: Smarter In-Context Learning for Multimodal AI

Adapting Vision-Language Models for Cell Detection in Optical Microscopy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates