TLDR: QuRe is a new method for Composed Image Retrieval (CIR) that improves user satisfaction by retrieving not just the exact target image, but also other highly relevant images. It achieves this by using a reward model and a unique “hard negative sampling” technique that identifies challenging, yet less relevant, images for training. The researchers also introduced a new dataset, HP-FashionIQ, to measure how well CIR models align with human preferences, on which QuRe showed superior performance.
In the rapidly evolving world of artificial intelligence, Composed Image Retrieval (CIR) stands out as a crucial technology. Imagine you’re online shopping: you see a shirt you like, but you want it in a different color or with a slight modification, like shorter sleeves. CIR allows you to combine a reference image with text descriptions to find exactly what you’re looking for. However, current CIR systems often fall short, sometimes showing you irrelevant items even when they manage to find the exact product you initially searched for. This can be frustrating and reduce your satisfaction.
The core problem lies in how these systems learn. Many use a technique called contrastive learning, where they treat the target image as the ‘correct’ answer and everything else in a batch as ‘incorrect.’ The issue is that ‘everything else’ might include images that are actually quite relevant but just not the *exact* target. These are called ‘false negatives,’ and mistakenly treating them as irrelevant can lead to a less satisfying search experience.
Introducing QuRe: A Smarter Approach to Image Retrieval
A new research paper, titled ‘QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval,’ by Jaehyun Kwak, Ramahdani Muhammad Izaaz Inhar, Se-Young Yun, and Sung-Ju Lee, proposes an innovative solution called QuRe. QuRe aims to improve user satisfaction by not only retrieving the precise target image but also by ensuring that other highly relevant images are ranked prominently. This means you’re more likely to see a collection of results that truly match your intent, even if they aren’t the single ‘perfect’ match.
QuRe tackles the false negative problem head-on. Instead of simply treating all non-target images as negatives, it uses a ‘reward model objective.’ This model is trained to understand and prioritize images that are highly relevant to your query. A key innovation is its ‘hard negative sampling’ strategy. Think of ‘hard negatives’ as images that are tricky for the system to distinguish from the correct answer – they’re similar enough to be confusing but are genuinely less relevant. QuRe identifies these challenging examples by looking for images whose relevance scores drop sharply after the target image in a sorted list. By focusing on these specific ‘hard negatives’ during training, QuRe learns to make finer distinctions, leading to more accurate and satisfying results.
Measuring Human Satisfaction with HP-FashionIQ
One of the significant contributions of this research is the creation of a new dataset called Human-Preference FashionIQ (HP-FashionIQ). Traditional evaluation methods for CIR models often just check if the target image was retrieved. However, this doesn’t fully capture whether a user is truly happy with the overall set of results. HP-FashionIQ addresses this by explicitly capturing human preferences. Researchers asked human annotators to compare two sets of retrieved images for a given query and choose which set they preferred. This allows for a more direct measure of how well a CIR model aligns with actual human satisfaction.
Also Read:
- Building Reliable AI Systems: A New Approach to Visual Classification with Trust and Context
- HMID-Net: A New Approach to Visual-Semantic Understanding in Hyperbolic Space
State-of-the-Art Performance and Human Alignment
The experiments conducted by the researchers demonstrate that QuRe achieves state-of-the-art performance on widely used CIR datasets like FashionIQ and CIRR. More importantly, QuRe showed the strongest alignment with human preferences on the newly introduced HP-FashionIQ dataset. This indicates that QuRe is not just good at finding the exact image, but it’s also excellent at understanding and delivering what users truly consider relevant and satisfying.
This breakthrough has significant implications for applications like e-commerce and visual search platforms. By providing more relevant and user-centric search results, QuRe can enhance the overall user experience and potentially boost engagement and satisfaction in online shopping and image exploration.
For more technical details, you can explore the full research paper here: QuRe Research Paper.


