spot_img
HomeResearch & DevelopmentEvaluating AI Models for Detailed Fashion Product Tagging

Evaluating AI Models for Detailed Fashion Product Tagging

TLDR: A study evaluated GPT-4o-mini and Gemini 2.0 Flash for automatically identifying fine-grained fashion attributes from images in a zero-shot setting. Gemini 2.0 Flash showed superior accuracy (56.79% F1 score) and was also more cost-effective and faster than GPT-4o-mini. While promising for e-commerce product attribution, the models performed better on prominent attributes and struggled with subtle details, indicating a need for further domain-specific refinement or human-in-the-loop systems.

The world of online fashion retail thrives on understanding its products. Imagine browsing a clothing website and being able to filter by very specific details like sleeve length, fabric type, or even the style of a neckline. This ability to accurately categorize and tag products with detailed attributes is called product attribution, and it’s crucial for a smooth customer experience and efficient inventory management.

Traditionally, product attribution has been a labor-intensive process, often relying on human annotators or seller-provided information. As fashion marketplaces grow to include millions of items, this manual approach becomes slow, prone to errors, and difficult to scale. This challenge has led industry experts to explore whether advanced artificial intelligence, specifically large language models (LLMs) with multimodal capabilities (meaning they can understand both text and images), could automate this complex task.

A recent research paper, titled “Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis,” delves into this very question. Authored by Shubbham Shukla and Kunal Sonalkar, the study evaluates the performance of two state-of-the-art, cost-efficient LLMs: GPT-4o-mini and Gemini 2.0 Flash. The goal was to see how well these models could identify detailed fashion attributes directly from images, without any prior specific training on fashion data – a method known as “zero-shot” analysis.

The researchers used the DeepFashion-MultiModal dataset, which contains high-quality, human-annotated labels for 18 different fashion attribute categories. These categories are grouped into three main classes: Shape Attributes (like sleeve length, neckline, and accessories), Fabric Type Attributes (such as denim, cotton, or leather), and Color Pattern Attributes (like floral, striped, or pure color). The models were given only an image as input, making it a pure test of their visual understanding.

The methodology involved feeding an input image to a “Prompt generation module” which created a query for the LLM. This query, along with the image, was sent to a “Prediction Engine” that interfaced with OpenRouter to call either Gemini 2.0 Flash or GPT-4o-mini. The model’s raw output was then parsed into a standardized format, and an “Evaluation Engine” compared these predictions against the true labels to calculate performance metrics like precision, recall, and F1-score.

The study conducted two main experiments. In the first, a “high-creativity” setting (with higher temperature and top p values), Gemini 2.0 Flash significantly outperformed GPT-4o-mini, achieving a macro F1-score of 49.72% compared to GPT-4o-mini’s 37.31%.

The second experiment used a more “deterministic” setting (with lower temperature and top p values), which encourages more focused and predictable outputs. In this setup, both models improved their performance. Gemini 2.0 Flash again demonstrated superior results with a macro F1-score of 56.79%, while GPT-4o-mini reached 43.28%. This highlights that for structured classification tasks like product attribution, reducing the model’s creative freedom leads to more accurate predictions.

Beyond just accuracy, the researchers also analyzed the practical aspects of using these models: cost and speed. Gemini 2.0 Flash proved to be not only more accurate but also more efficient. It was approximately 12.5% cheaper and 24% faster than GPT-4o-mini for processing 1000 images, making it a more economically viable option for large-scale deployment in e-commerce.

The findings indicate that while both models can perform zero-shot attribute extraction, Gemini 2.0 Flash is the clear leader in terms of accuracy, speed, and cost-efficiency. However, the study also revealed that both models performed better on visually prominent attributes like “Hat” and “Sleeve Length” but struggled with more subtle details such as “Neckline” and “Waist Accessories.” This suggests that while these LLMs have strong general visual recognition, they may still lack the specialized knowledge needed for consistently identifying very nuanced fashion details.

In conclusion, this research suggests that lightweight LLMs like Gemini 2.0 Flash can be powerful tools for e-commerce platforms. They can help reduce manual labor, speed up the process of adding new products, and enrich product catalogs, ultimately improving the customer’s shopping experience. While they might not yet fully replace human annotators for every subtle detail, they offer a promising solution for integrating AI-driven attribution into existing workflows. For more details, you can read the full research paper here.

Also Read:

Future work in this area includes exploring more advanced prompt engineering techniques, benchmarking these LLMs against specialized, fine-tuned computer vision models, and expanding the scope to include more subjective fashion attributes and diverse datasets.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -