TLDR: The paper “Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity” identifies typicality bias in human preference data as a key cause of mode collapse in aligned Large Language Models (LLMs), where models produce narrow, stereotypical responses. It introduces Verbalized Sampling (VS), a training-free prompting strategy that asks LLMs to verbalize a probability distribution over responses. VS significantly enhances output diversity across creative writing, dialogue simulation, open-ended QA, and synthetic data generation, without compromising factual accuracy or safety, effectively recovering the LLM’s inherent generative diversity.
Large Language Models (LLMs) have become incredibly powerful, but a common challenge they face after post-training alignment is a phenomenon known as ‘mode collapse’. This means that instead of generating a wide variety of plausible and creative responses, the model tends to favor a narrow set of stereotypical outputs, significantly reducing its diversity. This limitation can hinder LLMs in various applications, from creative writing to social simulation and even generating synthetic data.
Unlike previous research that often attributed mode collapse to algorithmic issues, a recent paper titled “Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity” by Jiayi Zhang, Simon Yu, Derek Chong, Anthony Sicilia, Michael R. Tomz, Christopher D. Manning, and Weiyan Shi, identifies a more fundamental and pervasive cause: ‘typicality bias’ in human preference data. This bias stems from well-established findings in cognitive psychology, where human annotators systematically favor familiar, fluent, and predictable text. Essentially, when humans provide feedback, they often lean towards what they perceive as ‘typical’ or ‘conventional’, and this preference gets amplified during the LLM alignment process, pushing the model towards less diverse outputs.
Introducing Verbalized Sampling (VS)
Motivated by this data-centric analysis, the researchers introduce Verbalized Sampling (VS), a simple yet powerful training-free prompting strategy designed to circumvent mode collapse. Instead of a traditional, direct prompt asking for a single instance (e.g., “Tell me a joke about coffee”), VS reformulates the prompt to explicitly ask the model to verbalize a probability distribution over a set of responses (e.g., “Generate 5 jokes about coffee and their corresponding probabilities”).
The core idea behind VS is that different prompts can cause the model to collapse into different ‘modes’. While a direct prompt might lead to a stereotypical response, prompting for a distribution encourages the model to approximate the broader, more diverse distribution it learned during its initial pre-training phase, thereby recovering its inherent generative diversity.
Also Read:
- DIVER: A New Approach to Enhance LLM Reasoning Through Diverse Exploration
- Unlocking Consistent LLM Performance: A New Hyperparameter-Free Decoding Method
Empirical Gains Across Diverse Applications
The paper presents comprehensive experiments demonstrating that VS significantly improves performance across a wide range of tasks without sacrificing factual accuracy or safety:
-
Creative Writing: In tasks like poem continuation, story generation, and joke writing, VS dramatically increases diversity by 1.6 to 2.1 times compared to direct prompting. It also maintains high quality and allows for ‘tunable diversity’, meaning users can adjust the probability threshold in the prompt to control the level of diversity in the output.
-
Dialogue Simulation: For simulating multi-turn dialogues, VS induces substantially more human-like behaviors and generates donation amount distributions that are closer to actual human behavior, making LLMs more effective for social simulations.
-
Open-Ended QA: In open-ended question-answering tasks with multiple valid answers, VS generates a broader and more realistic response distribution, aligning better with the pre-training distribution and increasing answer coverage, all while maintaining high precision.
-
Synthetic Data Generation: VS proves effective in generating more diverse synthetic data, which in turn improves the performance of downstream models, such as those used for math tasks.
An interesting emergent trend observed is that more capable and larger models benefit even more from Verbalized Sampling, suggesting that this method effectively unlocks the inherent creative potential of advanced LLMs. The research also confirms that VS does not compromise the model’s factual accuracy or safety, with refusal rates for harmful prompts remaining consistently high.
In conclusion, this work offers a fresh, data-driven perspective on why LLMs suffer from mode collapse and provides a practical, inference-time solution. Verbalized Sampling is a lightweight yet principled method that helps LLMs tap into their full generative diversity, paving the way for more creative and versatile AI applications.


