TLDR: KuaiLive is the first real-time, interactive dataset for live streaming recommendation, collected from China’s Kuaishou platform. It features precise live room timestamps, multiple user interaction types (click, comment, like, gift), and rich user/streamer side information. The dataset, comprising 23,772 users and 452,621 streamers over 21 days, addresses the lack of public data reflecting live streaming’s dynamic nature. Analysis reveals insights into streamer cold-start issues, long-tail popularity, and temporal user behaviors, establishing a benchmark for future research in areas like top-K recommendation, CTR prediction, watch time prediction, and fairness-aware recommendation.
Live streaming has become a massive part of online content consumption, offering dynamic content and real-time interactions. However, developing effective recommendation systems for these platforms has been challenging due to a lack of publicly available datasets that truly capture the unique, fast-paced nature of live streaming environments.
To address this significant gap, researchers have introduced KuaiLive, a groundbreaking real-time, interactive dataset. This dataset was collected from Kuaishou, a leading live streaming platform in China with over 400 million daily active users. KuaiLive records the interaction logs of 23,772 users and 452,621 streamers over a 21-day period, providing an unprecedented look into live streaming behaviors.
What Makes KuaiLive Unique?
KuaiLive stands out from existing datasets in several key ways. It includes precise start and end timestamps for live rooms, which is crucial for simulating the dynamic availability of content in real-time. It also captures multiple types of real-time user interactions, such as clicks, comments, likes, and virtual gifts, offering a more comprehensive view of user engagement. Furthermore, the dataset provides rich side information for both users and streamers, including demographics and behavioral summaries, which allows for more realistic modeling of user and streamer behaviors.
The dataset also includes negative feedback, such as live rooms that were exposed to users but skipped, which is vital for tasks like click-through rate (CTR) prediction. All data has undergone strict anonymization procedures to protect user privacy, with IDs hashed and timestamps offset, while still preserving meaningful patterns for research.
Insights from the Data
A thorough analysis of KuaiLive reveals fascinating characteristics of the live streaming ecosystem. For instance, the data shows a significant imbalance in streamer engagement, with about 70% of streamers receiving fewer than 5 user interactions. This highlights a widespread ‘cold-start’ problem for new or less popular streamers, where gaining visibility is a major challenge. There’s also a clear ‘long-tail’ effect, where a small number of top streamers dominate user attention, accounting for over 1.5% of all interactions. This imbalance poses challenges for ensuring fair exposure for all content creators.
User activity also shows distinct temporal patterns, with interactions peaking during evening hours (6 PM to 12 AM), aligning with streamers’ most active broadcasting times. There’s also a smaller peak around noon, possibly indicating users watching during lunch breaks. These patterns suggest that recommendation systems need to adapt their strategies based on the time of day.
Interestingly, the dataset reveals that ‘repeat consumption’ is a common behavior in live streaming, with 27.1% of users interacting with the same streamers multiple times. This is different from other content platforms where users typically consume an item only once. This indicates the importance of modeling long-term user preferences and habits.
Also Read:
- The Next Frontier: How Generative AI is Reshaping Recommendation Systems
- Breaking the Filter Bubble: How AI Uncovers Hidden Interests for Better Recommendations
Benchmarking and Future Directions
The researchers evaluated several representative recommendation methods on KuaiLive for tasks like top-K recommendation and CTR prediction. The results showed that modeling temporal signals is important, as time-aware methods generally performed better. It was also observed that recommending live rooms is more challenging than recommending streamers, largely because live rooms are temporary and often lack historical interaction data.
KuaiLive is expected to support a wide range of future research. This includes improving top-K recommendation, predicting various interaction rates (like gift-through rate), forecasting user watch time, and even predicting gift prices. Its rich multi-type user behavior data also makes it ideal for multi-behavior modeling, controllable learning, addressing cold-start problems, and developing fairness-aware recommendation systems to ensure equitable exposure for streamers. Furthermore, the dynamic nature of live rooms makes it well-suited for exploring end-to-end generative recommendation models.
The KuaiLive dataset and related resources are publicly available, providing a valuable resource to advance the development of intelligent live streaming services. For more details, you can refer to the original research paper.


