spot_img
HomeResearch & DevelopmentMercari's New Visual Search System Drives User Engagement and...

Mercari’s New Visual Search System Drives User Engagement and Sales

TLDR: Mercari has successfully deployed a scalable visual search system in its C2C marketplace, leveraging zero-shot vision-language models. The multilingual SigLIP model significantly outperformed existing baselines in offline evaluations and led to substantial increases in user engagement, conversion rates, and transactions during online A/B testing. The system uses dimensionality reduction for efficiency and highlights the practicality of zero-shot models for real-world visual search applications, despite ongoing challenges with precise identity matching.

In the dynamic world of consumer-to-consumer (C2C) marketplaces, where everyday individuals list a vast array of second-hand or surplus items, finding exactly what you’re looking for can be a challenge. Unlike traditional retail platforms with structured product catalogs, C2C listings often lack consistent naming conventions, category assignments, and uniform visual quality. This makes traditional text-based search engines less effective, especially for items that are primarily identified by their visual characteristics, such as fashion, character goods, or collectibles.

Mercari, a prominent C2C marketplace in Japan with over 20 million monthly active users, recognized this challenge. They sought to enhance product discovery for both buyers and sellers by implementing a scalable visual search system. This system allows users to upload an image and find visually similar items, offering an intuitive alternative to text-based searches. It also helps sellers research market values by looking up similar items before listing their own.

The core of Mercari’s new system lies in its adoption of advanced vision-language models, particularly those capable of ‘zero-shot’ retrieval. Zero-shot models are pre-trained on massive datasets of image-text pairs and can generalize well to new domains without requiring extensive fine-tuning on specific marketplace data. This is a significant advantage in a rapidly evolving C2C environment, where traditional fine-tuned models can be costly to maintain and less robust to changes in product listings.

Mercari evaluated several models, including their existing fine-tuned ‘baseline’ model, a Japanese CLIP-based model, DINOv2, and the multilingual SigLIP model. Through rigorous offline evaluations using user interaction logs, the multilingual SigLIP model emerged as the top performer. It achieved a 13.3% increase in nDCG@5 (a key retrieval metric) over the baseline, demonstrating superior precision and recall across all metrics. Importantly, SigLIP also maintained comparable computational efficiency to other models, making it ideal for production deployment.

Qualitative assessments further confirmed SigLIP’s strength. It consistently produced more semantically relevant and contextually accurate results, even with noisy, user-uploaded images. For instance, it could accurately identify specific characters in images, a task where the baseline model often struggled to differentiate similar-looking objects. This robustness to image noise and ability to generalize to nuanced visual contexts highlighted SigLIP’s potential for a more intuitive and effective image search experience.

The visual search system is designed for both real-time image-based retrieval and continuous background catalog indexing. When a user uploads an image, it’s processed by an image embedding generator, which converts the image into a compact 128-dimensional embedding. This embedding is then used to find the most similar items in the catalog. To ensure efficiency, the original 768-dimensional SigLIP embeddings are reduced to 128 dimensions using Principal Component Analysis (PCA). This dimensionality reduction significantly improves system efficiency, leading to approximately a 40% reduction in query latency and an 83% decrease in memory usage and index size, without compromising search quality.

To validate the system’s real-world impact, Mercari conducted a one-week online A/B test. The results were compelling: the group using the multilingual SigLIP model showed substantial gains in engagement and conversion. There was a 40.9% increase in average transactions per user via image search, a 34.1% increase in buyer conversion rate via image search, and a 46.6% increase in item view count per user via image search. These figures underscore the significant positive impact on user behavior and purchase rates.

Currently, Mercari’s image search is utilized by approximately 1.5 million users monthly, contributing to increased purchases and new matching experiences across categories like fashion, talent, and character goods. It also aids sellers in market price research. However, the system still faces challenges, particularly in precise identity matching, where customers expect exact matches for specific people or animated characters. Future work aims to address this by exploring finer-grained retrieval, personalization strategies, and deeper analysis of long-term user behavior.

Also Read:

This work demonstrates that zero-shot vision-language models can serve as a strong and practical foundation for deploying effective visual search systems in large-scale C2C marketplaces with minimal overhead, while retaining flexibility for future enhancements. You can read the full research paper here: Zero-Shot Retrieval for Scalable Visual Search in a Two-Sided Marketplace.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -