Generative Recommendation: Unpacking Model Scaling and Performance

TLDR: This research paper investigates the scaling behaviors of two generative recommendation (GR) paradigms: Semantic ID (SID)-based GR and LLM-as-RS. It reveals that SID-based GR models quickly reach performance saturation due to the limited capacity of SIDs to encode rich semantic information. In contrast, LLM-as-RS models exhibit superior scaling properties, with performance consistently improving as model size increases, and demonstrate an enhanced ability to capture collaborative filtering signals. The study positions LLM-as-RS as a promising path toward foundation models for GR, despite current efficiency trade-offs.

Generative Recommendation (GR) is an exciting new approach in the world of recommender systems, which are the engines that suggest products, videos, or friends to us online. Unlike traditional systems that often involve a two-step process of retrieving and then ranking items, GR aims to directly generate recommendations in a single step, leveraging the power of advanced generative models.

This research paper, titled “UNDERSTANDINGGENERATIVERECOMMENDATION WITHSEMANTICIDS FROM AMODEL-SCALINGVIEW” by Jingzhe Liu, Liam Collins, Jiliang Tang, Tong Zhao, Neil Shah, and Clark Mingxuan Ju, dives deep into how these generative recommendation models behave as they grow in size and complexity. The authors investigate two primary GR approaches: SID-based GR and LLM-as-RS.

SID-based Generative Recommendation: Hitting a Ceiling

One popular method, Semantic ID (SID)-based GR, works by first converting item descriptions (like text or images) into unique, discrete codes called Semantic IDs. These SIDs are then used by a sequential recommender model to predict the next item a user might interact with. The idea is to combine the rich semantic understanding from powerful language or vision models with patterns of user behavior.

However, the researchers found a significant limitation with this approach. While generative models in other fields often show predictable improvements as they scale up (known as scaling laws), SID-based GR models quickly hit a performance ceiling. Even when individual components like the underlying language model encoder, the quantization tokenizer (which creates the SIDs), or the recommender system itself were made larger, the overall recommendation performance didn’t improve much and often saturated rapidly.

The core issue identified was the limited capacity of SIDs to fully capture and transfer the rich semantic information from the powerful initial models to the recommender. Essentially, distilling complex information into these discrete SIDs acts as a bottleneck, preventing the system from fully benefiting from larger, more capable components.

LLM-as-RS: A Promising Path for Scaling

Motivated by the limitations of SID-based GR, the paper revisits another paradigm: using Large Language Models (LLMs) directly as recommenders (LLM-as-RS). In this setup, the LLM takes in plain text descriptions of a user’s interaction history and directly generates the title or description of the next recommended item.

The findings for LLM-as-RS were strikingly different. These models demonstrated superior scaling properties, meaning their performance consistently improved as the LLM’s size increased, showing no signs of saturation within the tested range. In fact, scaled-up LLM-as-RS models achieved up to a 20% improvement over the best performance of SID-based GR using the same training data.

Crucially, this research also challenges a common belief that LLMs struggle to capture collaborative filtering information – the patterns of user-item interactions. The study showed that LLMs’ ability to model these user behaviors actually improves as the LLMs themselves scale up, making them more effective at understanding and predicting user preferences.

Also Read:

The Road Ahead

In conclusion, this research highlights a fundamental architectural limitation in SID-based GR models regarding their ability to scale and effectively utilize semantic information. LLM-as-RS, on the other hand, emerges as a more promising direction for building powerful, scalable generative recommender systems, capable of learning both item semantics and collaborative filtering signals effectively. While LLM-as-RS currently faces efficiency challenges, especially in inference time compared to SID-based GR, its superior scaling behavior positions it as a strong candidate for future recommendation foundation models.

For a deeper dive into the methodology and detailed results, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Generative Recommendation: Unpacking Model Scaling and Performance

SID-based Generative Recommendation: Hitting a Ceiling

LLM-as-RS: A Promising Path for Scaling

The Road Ahead

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates