spot_img
HomeResearch & DevelopmentBoosting AI Image Quality: Full Trajectory Alignment and Online...

Boosting AI Image Quality: Full Trajectory Alignment and Online Preference Optimization

TLDR: A new method called Direct-Align and Semantic Relative Preference Optimization (SRPO) significantly improves AI image generation. Direct-Align allows diffusion models to be optimized across their entire image creation process, not just the final steps, preventing common issues like “reward hacking.” SRPO enables users to adjust image preferences online using text prompts, reducing the need for costly offline fine-tuning. This combined approach leads to more realistic and aesthetically pleasing AI-generated images with remarkable efficiency, outperforming existing methods.

A new research paper introduces a groundbreaking approach to enhance the quality and realism of AI-generated images, addressing key limitations in current diffusion models. Titled Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference, the work by Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, and Yansong Tang from Hunyuan, Tencent, The Chinese University of Hong Kong, Shenzhen, and Tsinghua University, presents two novel methods: Direct-Align and Semantic Relative Preference Optimization (SRPO).

Current methods for aligning diffusion models with human preferences often face two major hurdles. Firstly, they are computationally intensive, relying on multi-step denoising with gradient computation for reward scoring. This restricts optimization to only a few diffusion steps, making models susceptible to ‘reward hacking’ – where they achieve high scores for low-quality images. Secondly, these methods typically require continuous, costly offline adjustments of reward models to achieve desired aesthetic qualities like photorealism or specific lighting effects, lacking an online mechanism for real-time adjustments.

Direct-Align: Optimizing the Full Image Creation Process

To tackle the limitation of multi-step denoising, the researchers propose Direct-Align. This method predefines a noise prior, allowing the model to effectively recover original images from any time step through interpolation. This is a significant advancement because it leverages the fundamental equation that diffusion states are interpolations between noise and target images. By doing so, Direct-Align avoids over-optimization in the later stages of image generation and enables the reinforcement learning algorithm to be applied across the entire diffusion trajectory, from early, noisy stages to the final clean image. This full-trajectory optimization is crucial for preventing artifacts and improving overall image quality.

Semantic Relative Preference Optimization (SRPO): Online Control and Bias Mitigation

Complementing Direct-Align, the paper introduces Semantic Relative Preference Optimization (SRPO). In SRPO, rewards are formulated as text-conditioned signals. This innovative approach allows for online adjustment of rewards in response to positive and negative prompt augmentations. Essentially, users can guide the model’s preferences in real-time by adding descriptive words to their prompts, reducing the heavy reliance on offline reward fine-tuning. SRPO also plays a vital role in mitigating reward hacking by regularizing the reward signal. It does this by evaluating each sample with both positive and negative prompt conditional preferences, effectively filtering out information irrelevant to semantic guidance and neutralizing general biases.

Also Read:

Breakthrough Results and Efficiency

The researchers fine-tuned the FLUX.1.dev model using their SRPO framework, demonstrating remarkable improvements. Their method substantially enhances human-evaluated realism and aesthetic quality by over 3x compared to the baseline. For instance, it achieved an approximate 3.7-fold increase in perceived realism and a 3.1-fold improvement in aesthetic quality. Furthermore, the efficiency of SRPO is a major highlight. The method converges in just 10 minutes using 32 NVIDIA H20 GPUs, showcasing a 75x improvement in training efficiency compared to state-of-the-art online reinforcement learning methods like DanceGRPO, while matching or exceeding their image quality.

The extensive evaluations, including both automatic metrics and comprehensive human assessments, confirm that Direct-Align and SRPO achieve state-of-the-art performance. The approach is also robust across different CLIP-based reward models, consistently enhancing image realism and detail complexity without observing reward hacking. This work represents a significant step forward in aligning text-to-image models with fine-grained human preferences, offering more controllable, realistic, and aesthetically pleasing AI-generated images with unprecedented efficiency.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -