Rainbow Padding: A Simple Solution to Early Termination in Diffusion LLMs

TLDR: Instruction-tuned diffusion large language models (dLLMs) suffer from “overflow,” where responses become shorter with increased allocated length due to the dual role of the token as both terminator and padding. Rainbow Padding solves this by using a single for termination and a cyclic sequence of distinct padding tokens for unused positions. This decouples termination from padding, distributes probability mass, and prevents premature predictions, significantly improving response length and quality with minimal fine-tuning.

Diffusion large language models (dLLMs) are gaining traction as a powerful alternative to traditional autoregressive models. They offer flexible generation orders and show strong performance on complex reasoning tasks. However, a significant challenge has emerged with instruction-tuned dLLMs: a phenomenon called <eos>overflow. This issue causes responses to become paradoxically shorter as the allocated sequence length increases, often leading to early termination or a flood of <eos> (end-of-sequence) tokens.

The Problem: <eos> Overflow

Imagine giving a language model more space to write, only for it to produce less content. That’s the core of <eos>overflow. When users provide a longer generation budget (maximum length), instruction-tuned dLLMs frequently generate very short responses, terminating prematurely or filling the remaining space with repetitive <eos> tokens. This isn’t just an aesthetic problem; it severely degrades performance on tasks requiring detailed or lengthy outputs, such as mathematical reasoning or code generation.

The root cause lies in how these models are trained. Current instruction-tuning processes use the <eos> token for two purposes: to mark the legitimate end of a sequence and to fill unused positions as padding. This dual role creates a strong bias. During training, the model observes <eos> disproportionately at later positions, leading it to predict <eos> with high confidence even when it shouldn’t. This bias is amplified by adaptive decoding strategies, causing the termination probability to propagate backward and prematurely cut off responses.

Introducing Rainbow Padding

To tackle this critical vulnerability, researchers Bumjun Kim, Dongjae Jeon, Dueun Kim, Wonje Jeung, and Albert No from Yonsei University have introduced a simple yet highly effective solution called Rainbow Padding. This method fundamentally rethinks how padding is handled in dLLMs.

Instead of using repeated <eos> tokens for padding, Rainbow Padding reserves a single <eos> token exclusively for marking the true end of a sequence. All other padding positions are then filled with a repeating cycle of distinct padding tokens (e.g., <pad0>, <pad1>, <pad2>, etc.).

How Rainbow Padding Works

The intuition behind Rainbow Padding is straightforward. By decoupling the termination signal from padding, the model learns to use <eos> only when a response genuinely concludes. This corrects the biased probability distribution that previously inflated <eos> predictions. Furthermore, distributing the padding across multiple distinct tokens prevents probability mass from concentrating on any single symbol. Each padding token appears regularly but sparsely, teaching the model to treat them as low-probability placeholders rather than high-confidence guesses.

This approach stabilizes the model’s sampling dynamics. Content tokens gain relatively higher probability, encouraging the model to generate meaningful information first. The <eos> token then emerges at a semantically appropriate point, naturally concluding the content, rather than appearing as an early, high-probability prediction. The use of a deterministic cycle for padding tokens is also crucial, as it’s easy for the model to learn without diverting significant capacity from instruction-following tasks.

Also Read:

Impact and Results

Experiments demonstrate that Rainbow Padding substantially improves length robustness and output quality. Models adapted with Rainbow Padding produce significantly longer and more accurate responses on tasks like mathematical reasoning (MATH, GSM8K) and code generation (HumanEval), even with increased maximum length allocations. For instance, on the MATH benchmark, a baseline dLLM might achieve less than 1% accuracy with a max length of 1024, while the same model with Rainbow Padding can reach over 34% accuracy.

The method is also highly practical. It integrates efficiently into existing instruction-tuned models, requiring only a brief fine-tuning phase (e.g., LoRA fine-tuning for a single epoch on minimal data). It’s robust across various decoding strategies and is architecture-agnostic and dataset-agnostic. The research indicates that as few as seven distinct padding tokens are sufficient to prevent early termination effectively.

Rainbow Padding offers a lightweight and fundamental fix to a critical flaw in instruction-tuned dLLMs, reinforcing their potential as a robust alternative to autoregressive models. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Rainbow Padding: A Simple Solution to Early Termination in Diffusion LLMs

The Problem: <eos> Overflow

Introducing Rainbow Padding

How Rainbow Padding Works

Impact and Results

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates