Bridging the Noise Gap: Optimizing Diffusion Models with Sampler Stochasticity

TLDR: This research paper addresses the ‘reward gap’ in diffusion models fine-tuned with RLHF, which arises from using stochastic samplers during training and deterministic samplers for inference. The authors theoretically characterize this gap, demonstrating it narrows with training, and introduce a gDDIM framework for handling diverse stochasticity levels. Empirically, they show that moderate-to-high stochasticity during training improves the quality and stability of deterministic inference, advocating for a ‘high stochasticity in training, no stochasticity in generation’ approach for better text-to-image models.

Diffusion models have become incredibly powerful tools for generating images from text, with examples like Stable Diffusion and FLUX leading the way. These models are often fine-tuned using a technique called Reinforcement Learning from Human Feedback (RLHF) to make their outputs even better, aligning them with human preferences for aesthetics, safety, and overall quality.

However, a significant challenge arises in this process: a mismatch between how these models are trained and how they are used. During training, especially with RLHF, models often use ‘stochastic’ samplers. Think of these as samplers that introduce a bit of randomness or ‘noise’ to encourage the model to explore a wider range of possibilities and learn more robustly. But when it comes to actually generating an image for a user, ‘deterministic’ samplers are typically preferred. These are faster, more stable, and produce consistent results. This difference in noise levels between training and inference creates what researchers call a ‘reward gap’ – essentially, the quality you expect during training might not perfectly translate to the quality you get during inference.

A recent research paper, titled “UNDERSTANDING SAMPLER STOCHASTICITY IN TRAINING DIFFUSION MODELS FOR RLHF,” delves deep into this very problem. Authored by Jiayuan Sheng, Hanyang Zhao, Haoxian Chen, David D. Yao, and Wenpin Tang, the study provides crucial insights into why this reward gap exists and how to manage it effectively. You can read the full paper here: Research Paper.

Bridging the Gap: Theory and Methodology

The researchers tackled the reward gap from several angles. Theoretically, they developed mathematical ways to describe this gap, providing specific bounds for different types of diffusion models, such as Variance Exploding (VE) and Variance Preserving (VP) Gaussian models. Their findings suggest that this gap naturally shrinks as the model undergoes more training, which is a reassuring discovery for practitioners.

Methodologically, the paper introduces an adaptation of the generalized denoising diffusion implicit models (gDDIM) framework. This framework is key because it allows for arbitrary levels of ‘stochasticity’ – meaning it can handle different amounts of noise – and, importantly, supports even higher levels of noise (beyond what’s typically used) in a principled way. This expanded capability for noise exploration during training is vital for the model’s learning process.

Empirical Validation and Key Findings

To put their theories to the test, the team conducted extensive experiments using large-scale text-to-image models like Stable Diffusion v1.5 and FLUX.1. They employed popular RLHF algorithms such as Denoising Diffusion Policy Optimization (DDPO) and Mixed Group Relative Policy Optimization (MixGRPO), evaluating performance across various reward functions (e.g., ImageReward, PickScore, HPS v2, Aesthetic).

The empirical results strongly supported their theoretical predictions:

The reward gaps consistently narrowed as the training quality of the models improved. This indicates that as models get better at their task, the discrepancy between stochastic training and deterministic inference becomes less pronounced.
Training with moderate-to-high levels of stochasticity (specifically, an eta value of 1.2) often led to superior performance, both for familiar and new types of image generation. This suggests that a bit more ‘randomness’ during training can actually make the final, deterministic inference better.
Deterministic ODE samplers remained stable and frequently outperformed their stochastic counterparts when operating with a limited number of denoising steps, which is crucial for efficient image generation.
Interestingly, using richer and more complex text prompts also helped in reducing the SDE-ODE reward gap, leading to higher quality outputs.

Also Read:

Conclusion: A Path to Better Diffusion Models

The paper concludes that the strategy of using ‘high stochasticity in training samples, no stochasticity in generation’ is not only theoretically sound but also practically beneficial. This approach opens up new avenues for fine-tuning diffusion models, allowing for more robust and diverse post-training results by carefully adjusting stochasticity hyperparameters. This work provides valuable guidance for developers and researchers aiming to create even more powerful and reliable generative AI models for various applications, including video and multimodal generation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging the Noise Gap: Optimizing Diffusion Models with Sampler Stochasticity

Bridging the Gap: Theory and Methodology

Empirical Validation and Key Findings

Conclusion: A Path to Better Diffusion Models

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates