TLDR: A new research paper introduces Align Your Tangent (AYT), a method that significantly improves the training of Consistency Models (CMs) for faster generative AI. The authors discovered that CM training is hindered by ‘oscillatory tangents’ that move parallel to the data manifold instead of towards it. AYT proposes a Manifold Feature Distance (MFD) loss function, which uses self-supervised manifold features to align these tangents, accelerating training by orders of magnitude, outperforming LPIPS, and enabling effective training with very small batch sizes. This leads to faster convergence and better sample quality in generative models.
Generative AI models, particularly those based on diffusion and flow matching, have made incredible strides in creating realistic images and other data. However, generating these high-quality samples often comes with a significant computational cost, requiring many steps to produce a final output. This has led researchers to explore methods that can speed up this process without sacrificing quality.
One promising approach is the use of Consistency Models (CMs). These models are designed to generate high-quality samples in just one or two steps, drastically reducing inference time. Despite their potential, CMs typically face challenges during training, often requiring extensive training periods and large batch sizes to achieve competitive results. This can make them difficult to implement and scale.
Understanding the Training Challenge
A recent research paper, titled “ALIGNYOURTANGENT: TRAININGBETTERCONSIS-TENCYMODELS VIAMANIFOLD-ALIGNEDTANGENTS”, delves into the training dynamics of Consistency Models to uncover the root cause of these difficulties. Authored by Beomsu Kim, Byunghee Cha, and Jong Chul Ye from the Graduate School of AI, KAIST, the paper identifies a key issue: the “tangents” of CMs. These tangents represent the directions in which the model’s output updates during training. The researchers discovered that these tangents are often highly oscillatory, meaning they tend to move parallel to the underlying data manifold (the intrinsic structure of the data) rather than directly towards it. This oscillatory behavior hinders the model’s convergence, making training slow and inefficient.
Introducing Manifold-Aligned Tangents with AYT
To address this problem, the researchers propose a novel solution: the Manifold Feature Distance (MFD). This new loss function is designed to provide “manifold-aligned tangents” that consistently point towards the data manifold. Their method, dubbed Align Your Tangent (AYT), aims to make CM training more stable and efficient.
The core idea behind AYT is to learn a special “feature map” (a neural network) that transforms the data into a feature space. In this feature space, the gradients (which determine the tangent directions) are specifically designed to point towards the data manifold. Unlike previous methods that might use fixed feature maps or rely on external, human-supervised metrics like LPIPS (Learned Perceptual Image Patch Similarity), AYT’s manifold features are learned in a self-supervised manner. This means the model learns to identify and emphasize the directions that are most relevant for aligning the tangents, without needing human-labeled data for the feature map itself.
The manifold features are trained using various image transformations, including degradations (like Gaussian noise and blur), geometric changes (such as scaling and rotation), and color adjustments (brightness, contrast, etc.). By learning to map these perturbed images back to their original manifold, the feature map ensures that its gradients effectively guide the CM tangents towards the true data structure.
Also Read:
- Novel Training Approaches for Diffusion Models Significantly Enhance Generative AI Efficiency
- SiD-DiT: Bridging Diffusion and Flow Matching for Faster Image Synthesis
Impressive Results and Broader Impact
The experimental results of AYT are compelling. The method significantly accelerates CM training, achieving convergence orders of magnitude faster than traditional approaches using pseudo-Huber loss. Furthermore, AYT not only speeds up training but also improves the quality of generated samples, even outperforming LPIPS, a widely used perceptual metric. A particularly noteworthy finding is AYT’s robustness to batch size; it can achieve competitive performance with extremely small batch sizes (e.g., 16), which is a major advantage for resource-constrained environments.
When compared to other state-of-the-art methods, AYT demonstrates strong performance. It improves 1-step FID (Fréchet Inception Distance, a measure of image quality) over Easy Consistency Training (ECT) on datasets like CIFAR10 and ImageNet64x64, while maintaining comparable 2-step performance. It also holds its own against advanced diffusion distillation models, even though AYT trains models from scratch without relying on pre-trained teacher models. For more details, you can read the full research paper here.
The implications of this research extend beyond image generation. The self-supervised approach to learning manifold features could be applied to other data modalities like audio, text, or multimodal data, using domain-specific augmentation strategies. While the current study focuses on relatively small-scale settings, the lightweight nature of the auxiliary classifier suggests that AYT could scale effectively to larger datasets and higher resolutions, potentially integrating with latent diffusion models.
In conclusion, Align Your Tangent offers a practical and powerful way to train Consistency Models more efficiently and reliably. By understanding and correcting the oscillatory behavior of CM tangents, this method paves the way for faster, higher-quality generative AI with reduced computational demands.


