spot_img
HomeResearch & DevelopmentEnhancing Recommendation Systems with Multi-Modal Indexing and Lifelong User...

Enhancing Recommendation Systems with Multi-Modal Indexing and Lifelong User Behavior

TLDR: MISS (Multi-modal Indexing and Searching with lifelong Sequence) is a new model for large-scale recommendation systems. It addresses challenges in retrieval by integrating multi-modal information (like images and text) and long-term user behavior. It uses a multi-modal index tree for better item similarity representation and two specialized search units (Co-GSU and MM-GSU) to capture diverse user interests from their historical interactions. Online experiments at Kuaishou show significant improvements in recommendation effectiveness and user engagement.

Large-scale recommendation systems, like those used by platforms with vast amounts of content, typically operate in two main stages: retrieval and ranking. The retrieval stage is crucial for quickly sifting through a massive collection of items to identify a smaller, relevant set for a user, even though it has very limited time to do so. Recent advancements in this area aim to incorporate more comprehensive information about users and items to improve performance.

One significant challenge in current retrieval methods is effectively utilizing a user’s lifelong sequential behavior – their long history of interactions. While this data is valuable, it’s difficult to process efficiently in the retrieval stage due to the sheer volume of candidate items. Additionally, many existing retrieval methods primarily rely on interaction data, often overlooking the rich insights available from multi-modal information, such as images and text associated with items.

To address these challenges, researchers have introduced a pioneering model called MISS: Multi-modal Indexing and Searching with lifelong Sequence. This innovative approach integrates multi-modal information and lifelong user behavior into an advanced tree-based retrieval model. MISS is composed of two key components: a multi-modal index tree and a multi-modal lifelong sequence modeling module.

The multi-modal index tree is designed to create a more precise representation of item similarity. Unlike traditional methods that might rely solely on interaction data, this tree is built using multi-modal embeddings. These embeddings combine both content (like images and text) and interaction information, allowing the tree to group similar items more effectively. This hierarchical structure helps in efficiently narrowing down the search for relevant items.

For capturing diverse user interests from their extensive historical interactions, MISS introduces a multi-modal lifelong sequence modeling module. This module features two specialized units: the collaborative general search unit (Co-GSU) and the multi-modal general search unit (MM-GSU). The Co-GSU retrieves relevant behaviors based on collaborative information, while the MM-GSU focuses on multi-modal information. These units work together to identify the most relevant parts of a user’s long behavior sequence, even if those interests are not recent. An exact search unit (ESU) then refines the relationship between candidate items and the retrieved behaviors.

The model also incorporates a Multi-gate Mixture-of-Experts (MMoE) module for multi-task learning, allowing it to optimize for various user feedback signals simultaneously, such as likes, comments, and video completion rates.

MISS has been successfully deployed in Kuaishou’s recommendation system, serving hundreds of millions of daily active users. Offline experiments demonstrated that MISS significantly outperforms state-of-the-art baseline models in retrieval metrics like Recall@K, showing improvements of over 30% in various recall scenarios. An ablation study confirmed the individual effectiveness of each proposed module: the multi-modal index tree, MM-GSU, and Co-GSU.

Further analysis revealed interesting insights into how the model utilizes user behavior. While increasing the length of the user behavior sequence generally improves performance, there’s a trade-off with computational resources. The attention mechanism within MM-GSU was found to be particularly effective at identifying long-term user interests, unlike some traditional models that tend to focus only on recent interactions. The Co-GSU and MM-GSU also complement each other, with a low overlap rate in their search results, indicating they capture different facets of user interest.

Online A/B tests conducted with real users at Kuaishou showed promising results. The proposed MISS model led to a noticeable increase in key user engagement metrics, including Total App Usage Time and App Usage Time Per User, confirming its effectiveness in a real-world industrial setting.

Also Read:

In conclusion, MISS represents a significant step forward in retrieval recommendation by effectively leveraging multi-modal information and lifelong sequential user behavior. This approach provides a more comprehensive understanding of user interests, leading to more accurate and engaging recommendations. For more details, you can refer to the research paper.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -