Voost: Bridging Virtual Try-On and Try-Off with a Single AI

TLDR: Voost is a new AI framework that uses a single diffusion transformer to perform both virtual try-on (putting clothes on a person) and virtual try-off (reconstructing the original garment from a dressed person). By training these tasks jointly, Voost improves garment-person interaction, leading to more realistic and accurate results across various poses and garment types. It also introduces inference-time techniques for better robustness and consistency, achieving state-of-the-art performance in both tasks.

The world of online fashion is constantly evolving, and virtual try-on technology is at the forefront of this transformation. Imagine being able to see how a garment looks on you without physically trying it on. This is the promise of virtual try-on (VTON), a generative AI task that creates a realistic image of a person wearing a target garment. However, accurately modeling how clothes fit and drape on a person, especially with different poses and appearances, has always been a significant challenge.

A new research paper introduces a groundbreaking framework called Voost, which aims to overcome these hurdles. Voost is a unified and scalable diffusion transformer that not only handles virtual try-on but also its inverse: virtual try-off. Virtual try-off is the task of reconstructing the original appearance of a garment from an image of a person wearing it. By learning both tasks simultaneously, Voost allows each garment-person pair to supervise both directions, significantly enhancing the AI’s understanding of garment-body relationships without needing separate networks, extra losses, or additional labels.

How Voost Works

At its core, Voost uses a single diffusion transformer, a powerful type of AI model, to learn both try-on and try-off. Unlike previous methods that might struggle with precise garment-person correspondence, Voost adopts a unique token-level concatenation structure. This means that the garment image and the person image are placed side-by-side and fed into a shared embedding space. This design allows the model to reason bidirectionally across both try-on and try-off scenarios using a common conditioning layout.

The framework is also highly scalable, supporting dynamic input layouts. This means it can handle diverse poses, aspect ratios, and spatial arrangements of images, making it robust for real-world applications. A special ‘task token’ tells the model whether to perform a try-on or try-off, and also specifies the garment category (e.g., upper, lower, full-body).

Smart Enhancements for Better Results

Voost introduces two clever techniques that refine its performance during inference (when the model is generating images):

Attention Temperature Scaling: This technique helps the model adapt its focus when the input image resolution or mask size differs from what it was trained on. It ensures that the AI’s ‘attention’ remains sharp and relevant, especially when dealing with challenging layouts where the masked region might be small.
Self-Corrective Sampling: This is a unique mechanism that leverages the model’s dual capability. During the image generation process, Voost can predict a dressed person image (try-on) and then use that prediction to perform a reverse try-off pass, reconstructing the original garment. By comparing this reconstructed garment to the actual conditioning garment, the model can iteratively refine its output, ensuring consistency and improving visual fidelity.

Impressive Performance

Extensive experiments show that Voost achieves state-of-the-art results on both virtual try-on and try-off benchmarks. It consistently outperforms existing methods in terms of alignment accuracy, visual fidelity, and generalization. A user study further confirmed its superiority, with participants consistently preferring Voost’s outputs for photorealism, garment detail, and garment structure.

The research also highlights the benefits of its joint training approach; the dual-task model significantly outperforms single-task models, indicating that learning both directions creates a more generalized understanding of garment-person interaction. Furthermore, the study found that fine-tuning only the attention modules within the transformer, rather than the entire model, achieved the best performance while significantly reducing training costs.

Also Read:

Looking Ahead

While Voost marks a significant leap forward in virtual try-on and try-off technology, the researchers acknowledge areas for future improvement. Currently, precise control over garment fit can be ambiguous due to the lack of explicit structural or sizing information. Future work plans to incorporate additional cues like body measurements or garment metadata to enhance controllability. The strong foundation of Voost also makes it well-suited for extensions into video-based or 3D synthesis, promising even more immersive virtual fashion experiences.

For more technical details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Voost: Bridging Virtual Try-On and Try-Off with a Single AI

How Voost Works

Smart Enhancements for Better Results

Impressive Performance

Looking Ahead

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Generative AI Powers Next-Gen Autonomous Emergency Response

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates