Boosting AI Image Quality: Full Trajectory Alignment and Online Preference Optimization

TLDR: A new method called Direct-Align and Semantic Relative Preference Optimization (SRPO) significantly improves AI image generation. Direct-Align allows diffusion models to be optimized across their entire image creation process, not just the final steps, preventing common issues like “reward hacking.” SRPO enables users to adjust image preferences online using text prompts, reducing the need for costly offline fine-tuning. This combined approach leads to more realistic and aesthetically pleasing AI-generated images with remarkable efficiency, outperforming existing methods.

A new research paper introduces a groundbreaking approach to enhance the quality and realism of AI-generated images, addressing key limitations in current diffusion models. Titled Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference, the work by Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, and Yansong Tang from Hunyuan, Tencent, The Chinese University of Hong Kong, Shenzhen, and Tsinghua University, presents two novel methods: Direct-Align and Semantic Relative Preference Optimization (SRPO).

Current methods for aligning diffusion models with human preferences often face two major hurdles. Firstly, they are computationally intensive, relying on multi-step denoising with gradient computation for reward scoring. This restricts optimization to only a few diffusion steps, making models susceptible to ‘reward hacking’ – where they achieve high scores for low-quality images. Secondly, these methods typically require continuous, costly offline adjustments of reward models to achieve desired aesthetic qualities like photorealism or specific lighting effects, lacking an online mechanism for real-time adjustments.

Direct-Align: Optimizing the Full Image Creation Process

To tackle the limitation of multi-step denoising, the researchers propose Direct-Align. This method predefines a noise prior, allowing the model to effectively recover original images from any time step through interpolation. This is a significant advancement because it leverages the fundamental equation that diffusion states are interpolations between noise and target images. By doing so, Direct-Align avoids over-optimization in the later stages of image generation and enables the reinforcement learning algorithm to be applied across the entire diffusion trajectory, from early, noisy stages to the final clean image. This full-trajectory optimization is crucial for preventing artifacts and improving overall image quality.

Semantic Relative Preference Optimization (SRPO): Online Control and Bias Mitigation

Complementing Direct-Align, the paper introduces Semantic Relative Preference Optimization (SRPO). In SRPO, rewards are formulated as text-conditioned signals. This innovative approach allows for online adjustment of rewards in response to positive and negative prompt augmentations. Essentially, users can guide the model’s preferences in real-time by adding descriptive words to their prompts, reducing the heavy reliance on offline reward fine-tuning. SRPO also plays a vital role in mitigating reward hacking by regularizing the reward signal. It does this by evaluating each sample with both positive and negative prompt conditional preferences, effectively filtering out information irrelevant to semantic guidance and neutralizing general biases.

Also Read:

Breakthrough Results and Efficiency

The researchers fine-tuned the FLUX.1.dev model using their SRPO framework, demonstrating remarkable improvements. Their method substantially enhances human-evaluated realism and aesthetic quality by over 3x compared to the baseline. For instance, it achieved an approximate 3.7-fold increase in perceived realism and a 3.1-fold improvement in aesthetic quality. Furthermore, the efficiency of SRPO is a major highlight. The method converges in just 10 minutes using 32 NVIDIA H20 GPUs, showcasing a 75x improvement in training efficiency compared to state-of-the-art online reinforcement learning methods like DanceGRPO, while matching or exceeding their image quality.

The extensive evaluations, including both automatic metrics and comprehensive human assessments, confirm that Direct-Align and SRPO achieve state-of-the-art performance. The approach is also robust across different CLIP-based reward models, consistently enhancing image realism and detail complexity without observing reward hacking. This work represents a significant step forward in aligning text-to-image models with fine-grained human preferences, offering more controllable, realistic, and aesthetically pleasing AI-generated images with unprecedented efficiency.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting AI Image Quality: Full Trajectory Alignment and Online Preference Optimization

Direct-Align: Optimizing the Full Image Creation Process

Semantic Relative Preference Optimization (SRPO): Online Control and Bias Mitigation

Breakthrough Results and Efficiency

Gen AI News and Updates

Obello Secures $9.5 Million to Revolutionize Brand Creative Scaling with AI

Generative AI Powers Next-Gen Autonomous Emergency Response

C3-Diff: Enhancing Spatial Gene Expression Maps with AI and Histology

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates