TLDR: This research paper investigates the scaling behaviors of two generative recommendation (GR) paradigms: Semantic ID (SID)-based GR and LLM-as-RS. It reveals that SID-based GR models quickly reach performance saturation due to the limited capacity of SIDs to encode rich semantic information. In contrast, LLM-as-RS models exhibit superior scaling properties, with performance consistently improving as model size increases, and demonstrate an enhanced ability to capture collaborative filtering signals. The study positions LLM-as-RS as a promising path toward foundation models for GR, despite current efficiency trade-offs.
Generative Recommendation (GR) is an exciting new approach in the world of recommender systems, which are the engines that suggest products, videos, or friends to us online. Unlike traditional systems that often involve a two-step process of retrieving and then ranking items, GR aims to directly generate recommendations in a single step, leveraging the power of advanced generative models.
This research paper, titled “UNDERSTANDINGGENERATIVERECOMMENDATION WITHSEMANTICIDS FROM AMODEL-SCALINGVIEW” by Jingzhe Liu, Liam Collins, Jiliang Tang, Tong Zhao, Neil Shah, and Clark Mingxuan Ju, dives deep into how these generative recommendation models behave as they grow in size and complexity. The authors investigate two primary GR approaches: SID-based GR and LLM-as-RS.
SID-based Generative Recommendation: Hitting a Ceiling
One popular method, Semantic ID (SID)-based GR, works by first converting item descriptions (like text or images) into unique, discrete codes called Semantic IDs. These SIDs are then used by a sequential recommender model to predict the next item a user might interact with. The idea is to combine the rich semantic understanding from powerful language or vision models with patterns of user behavior.
However, the researchers found a significant limitation with this approach. While generative models in other fields often show predictable improvements as they scale up (known as scaling laws), SID-based GR models quickly hit a performance ceiling. Even when individual components like the underlying language model encoder, the quantization tokenizer (which creates the SIDs), or the recommender system itself were made larger, the overall recommendation performance didn’t improve much and often saturated rapidly.
The core issue identified was the limited capacity of SIDs to fully capture and transfer the rich semantic information from the powerful initial models to the recommender. Essentially, distilling complex information into these discrete SIDs acts as a bottleneck, preventing the system from fully benefiting from larger, more capable components.
LLM-as-RS: A Promising Path for Scaling
Motivated by the limitations of SID-based GR, the paper revisits another paradigm: using Large Language Models (LLMs) directly as recommenders (LLM-as-RS). In this setup, the LLM takes in plain text descriptions of a user’s interaction history and directly generates the title or description of the next recommended item.
The findings for LLM-as-RS were strikingly different. These models demonstrated superior scaling properties, meaning their performance consistently improved as the LLM’s size increased, showing no signs of saturation within the tested range. In fact, scaled-up LLM-as-RS models achieved up to a 20% improvement over the best performance of SID-based GR using the same training data.
Crucially, this research also challenges a common belief that LLMs struggle to capture collaborative filtering information – the patterns of user-item interactions. The study showed that LLMs’ ability to model these user behaviors actually improves as the LLMs themselves scale up, making them more effective at understanding and predicting user preferences.
Also Read:
- AI Agents Transform Data Analysis: A Comprehensive Overview
- New Scaling Laws for Combining Large Language Models
The Road Ahead
In conclusion, this research highlights a fundamental architectural limitation in SID-based GR models regarding their ability to scale and effectively utilize semantic information. LLM-as-RS, on the other hand, emerges as a more promising direction for building powerful, scalable generative recommender systems, capable of learning both item semantics and collaborative filtering signals effectively. While LLM-as-RS currently faces efficiency challenges, especially in inference time compared to SID-based GR, its superior scaling behavior positions it as a strong candidate for future recommendation foundation models.
For a deeper dive into the methodology and detailed results, you can read the full research paper here.


