spot_img
HomeResearch & DevelopmentMobilePicasso: Bringing High-Resolution Image Editing to Your Phone with...

MobilePicasso: Bringing High-Resolution Image Editing to Your Phone with Speed and Clarity

TLDR: MobilePicasso is a novel system enabling efficient 4K image editing on mobile devices. It uses a three-stage pipeline: standard-resolution editing with hallucination-aware loss, learnable latent projection, and upscaling with adaptive context-preserving tiling. This approach significantly improves image quality, reduces hallucinations, and offers substantial speed-ups (up to 55.8x faster than baselines, and even faster than server-based GPU models) with minimal memory usage, making high-resolution on-device image editing practical.

High-resolution image editing on mobile devices has long been a challenging task. Traditional diffusion models, while powerful for image-to-image synthesis, often struggle with memory limitations and computational demands when deployed on smartphones, tablets, or TVs. Furthermore, these models frequently produce ‘hallucinations’ – unrealistic or unintended objects – especially at higher resolutions, leading to a degraded user experience.

A new research paper titled ‘Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling’ introduces a novel system called MobilePicasso, designed to overcome these significant hurdles. Developed by Young D. Kwon, Abhinav Mehrotra, Malcolm Chadwick, Alberto Gil Ramos, and Sourav Bhattacharya from Samsung AI Center-Cambridge, MobilePicasso aims to bring efficient 4K image editing directly to mobile devices without compromising quality or speed.

The core innovation of MobilePicasso lies in its three-stage hybrid pipeline, which breaks down the complex task of high-resolution image editing into more manageable steps. This modular approach allows for efficient processing and addresses the limitations of mobile hardware.

Also Read:

The Three Stages of MobilePicasso:

The first stage involves performing image editing at a standard resolution, typically 512×512 pixels. This is where MobilePicasso introduces a ‘hallucination-aware loss’ mechanism. By training the model to detect and penalize unrealistic elements during this initial editing phase, it significantly reduces the occurrence of distorted faces, floating objects, or implausible scenes that are common in other diffusion models. This stage also incorporates data filtering to remove images with artifacts from the training dataset, further enhancing the model’s ability to produce realistic outputs.

The second stage is a ‘learnable latent projection.’ Instead of directly upscaling the image in pixel space, which is computationally expensive, MobilePicasso projects the edited image’s latent representation (a compressed, abstract form of the image) to a higher resolution latent space. This process is highly efficient, using a lightweight projection model that is significantly faster and requires less memory than traditional encoding and decoding steps.

Finally, the third stage focuses on ‘upscaling’ the edited latent to the desired high resolution, such as 4K. This stage integrates ‘Adaptive Context-Preserving Tiling (ACPT)’ and a ‘model/system co-design’ approach. ACPT is a clever tiling strategy that processes images in smaller segments without the need for large, computationally intensive overlaps between tiles. It uses ‘adjacent padding,’ which leverages information from neighboring tiles to ensure smooth transitions and prevent glitches or seams, a common problem with other tiling methods. The model/system co-design further optimizes performance by identifying optimal tile sizes for mobile NPUs, leading to substantial latency reductions.

The results of MobilePicasso are quite impressive. A user study involving 46 participants revealed that MobilePicasso not only improves image quality by 18-48% but also reduces hallucinations by 14-51% compared to existing methods. In terms of performance, it achieves up to a 55.8x speed-up over baselines using tiling with overlaps. Surprisingly, MobilePicasso running on a Samsung Galaxy S23 is even 4.71x faster than a server-based high-resolution image editing model running on a powerful A100 GPU, all while maintaining a remarkably low memory footprint of 1.15 GB, well within mobile device constraints.

This breakthrough paves the way for practical, real-world high-resolution image editing applications directly on mobile devices, offering users enhanced privacy and a seamless experience. For more in-depth technical details, you can refer to the full research paper. Read the full paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -