DiffusionNFT: A Faster, More Flexible Way to Train Generative AI Models

TLDR: DiffusionNFT is a novel online reinforcement learning (RL) paradigm for diffusion models that directly optimizes on the forward process using flow matching. It contrasts positive and negative generations to define an implicit policy improvement direction, eliminating the need for likelihood estimation and supporting arbitrary black-box solvers. The method is up to 25 times more efficient than FlowGRPO and significantly boosts performance across various benchmarks without requiring Classifier-Free Guidance (CFG).

Online reinforcement learning (RL) has been a game-changer for improving large language models after their initial training, helping them align better with human preferences and enhance their reasoning. However, bringing similar success to diffusion models, which are powerful tools for visual generation, has been a significant challenge. The main hurdle lies in the difficulty of calculating exact likelihoods in diffusion models, which are crucial for traditional RL methods.

Previous attempts to apply RL to diffusion models often involved discretizing the reverse sampling process, essentially turning diffusion generation into a multi-step decision-making problem. While this allowed for the use of existing RL algorithms like GRPO, it came with several drawbacks. These methods often suffered from a lack of consistency with the forward diffusion process, restrictions on the types of solvers that could be used, and complicated integration with Classifier-Free Guidance (CFG), a technique commonly used to improve image quality.

Introducing DiffusionNFT: A New Approach

A new paradigm called Diffusion Negative-aware FineTuning (DiffusionNFT) has been introduced to overcome these limitations. Instead of relying on the traditional Policy Gradient framework, DiffusionNFT optimizes diffusion models directly on the forward process using a technique called flow matching. This method cleverly contrasts positive and negative generations to define an implicit direction for policy improvement, seamlessly integrating reinforcement signals into the standard supervised learning objective.

The core idea is to split generated samples into positive and negative groups based on a reward function. By learning from both good and bad examples, DiffusionNFT can guide the model towards better generations. This approach offers several practical benefits:

Solver Flexibility: DiffusionNFT allows for the use of any black-box solvers during data collection, unlike previous methods that were restricted to first-order SDE samplers.
Efficiency: It eliminates the need to store entire sampling trajectories, requiring only clean images and their associated rewards for policy optimization.
CFG-Free Operation: The method naturally incorporates reinforcement guidance directly into the optimized policy, making Classifier-Free Guidance (CFG) unnecessary. This simplifies the training process and improves efficiency.
Likelihood-Free: DiffusionNFT bypasses the need for complex and often biased likelihood estimations, which is a fundamental constraint for many other diffusion RL methods.

Performance and Efficiency

The effectiveness of DiffusionNFT has been demonstrated through extensive experiments. When compared head-to-head with FlowGRPO, DiffusionNFT proved to be significantly more efficient, achieving up to 25 times faster training. For instance, it improved the GenEval score from 0.24 to 0.98 within just 1,000 steps, while FlowGRPO took over 5,000 steps and required additional CFG employment to reach 0.95.

Furthermore, by leveraging multiple reward models, DiffusionNFT substantially boosted the performance of SD3.5-Medium across various benchmarks, including GenEval, OCR, PickScore, ClipScore, HPSv2.1, Aesthetic, ImageReward, and UnifiedReward. Remarkably, it achieved this while being entirely CFG-free, even outperforming larger CFG-based models like SD3.5-L and FLUX.1-Dev in some metrics.

Also Read:

Practical Implementation Details

The practical implementation of DiffusionNFT involves a few key design choices. Rewards, which are often continuous scalars, are transformed into an optimality probability between 0 and 1. The sampling policy is updated using a ‘soft’ Exponential Moving Average (EMA) approach, balancing learning speed and stability. An adaptive weighting scheme is used for the flow-matching loss, further enhancing training stability. The decision to operate in a CFG-free setting, despite leading to a lower initial performance, proved beneficial as the model quickly surpassed CFG baselines through RL post-training.

This work represents a significant step towards unifying supervised and reinforcement learning in the diffusion domain, highlighting the forward process as a promising foundation for scalable, efficient, and theoretically sound diffusion RL. For more in-depth technical details, you can refer to the full research paper: DiffusionNFT: Online Diffusion Reinforcement with Forward Process.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

DiffusionNFT: A Faster, More Flexible Way to Train Generative AI Models

Introducing DiffusionNFT: A New Approach

Performance and Efficiency

Practical Implementation Details

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates