TLDR: LSSGen is a novel framework that accelerates text-to-image generation by performing resolution scaling directly in the latent space. It uses a lightweight latent upsampler, noise compensation, and timestep rescheduling to achieve significant speedups (up to 1.5x) and superior image quality, avoiding artifacts common in pixel-space scaling. The method is compatible with various flow and diffusion models, making high-resolution image synthesis more efficient and visually appealing.
Text-to-image generation has seen incredible advancements, allowing us to create photorealistic images from simple text prompts. However, a significant challenge remains: generating high-resolution images quickly and efficiently. Traditional methods often struggle with this, as the computational cost increases dramatically with image size, and attempts to speed things up by scaling images in pixel space can introduce unwanted blurriness and distortions.
A new framework called LSSGen, which stands for Latent Space Scaling Generation, aims to tackle these issues head-on. Developed by researchers from Inventec Corporation and the University at Albany, LSSGen proposes a novel way to perform resolution scaling directly within the ‘latent space’ – a compressed representation of the image data – rather than in the pixel space.
How LSSGen Works
The core idea behind LSSGen is to start the image generation process at a lower resolution in the latent space and then progressively upscale it. This is achieved through three key components:
-
Latent Space Upsampler: Instead of resizing images in pixel space, LSSGen uses a lightweight, specialized upsampler that works directly on the latent features. This upsampler is designed to be compatible with various generative models, meaning it can be reused across different systems without needing to be retrained.
-
Noise Compensation: As the resolution is scaled up, the system needs to account for changes in noise characteristics. LSSGen introduces a clever noise compensation and rescheduling strategy to ensure consistency between the noise and the image data at each stage, leading to more stable and higher-quality results.
-
Timestep Schedule Shifting: To further boost efficiency, LSSGen allocates more denoising steps to the early, lower-resolution stages where computation is less expensive. This strategy significantly reduces the overall computational cost without compromising image fidelity.
Impressive Results
Extensive experiments have shown that LSSGen delivers a strong balance between computational efficiency and image quality. When generating 1024×1024 images, LSSGen can achieve up to a 1.5x speedup while maintaining or even improving image quality across various metrics. For example, on models like FLUX.1-dev, LSSGen improved image quality by 3-8% compared to the baseline, with only a minimal drop in text-image alignment.
The benefits become even more pronounced at higher resolutions, such as 2048×2048, where the computational cost typically increases quadratically. LSSGen significantly outperforms conventional scaling approaches like MegaFusion, which often suffer from noticeable quality loss and blur artifacts. LSSGen maintains high perceptual quality while offering substantial speed improvements, making it a practical and scalable solution for high-resolution image synthesis.
The framework is compatible with a wide range of state-of-the-art models, including Rectified Flow (RF) models like FLUX.1-dev and FLUX.1-schnell, and Diffusion Models (DM) such as SDXL, Playground-v2.5, SD1.5, and LCM-SDXL. This broad compatibility highlights LSSGen’s versatility and potential for widespread adoption.
Also Read:
- Optimizing Diffusion Models: Introducing SegQuant for Enhanced Efficiency and Image Quality
- Strengthening Safety in Diffusion Models Against Fine-Tuning
Conclusion
LSSGen represents a significant step forward in efficient text-to-image generation. By intelligently leveraging latent space scaling, it addresses the long-standing trade-off between speed and image quality, allowing for faster creation of stunning, high-resolution AI-generated images without the common artifacts associated with traditional methods. This innovation opens doors for more accessible and powerful AI art and image creation tools. You can read the full research paper here: LSSGen Research Paper.


