TLDR: A new training paradigm called Request-Only Optimization (ROO) significantly improves the efficiency and quality of large-scale deep learning recommendation models. By treating user requests as single training units instead of individual impressions, ROO eliminates redundant data storage and computation, leading to substantial increases in training throughput and enabling the adoption of more complex and effective model architectures like Generative Recommenders and HSTU. This approach has been successfully deployed in major recommendation products, demonstrating significant gains in both system efficiency and model performance.
Deep Learning Recommendation Models (DLRMs) are the backbone of today’s massive recommendation systems, serving billions of users daily. These systems process petabytes of data and perform trillions of operations per example, leading to significant challenges in data storage, training efficiency, and model complexity. A new approach, Request-Only Optimization (ROO), has been introduced to tackle these issues head-on, aiming to improve both efficiency and model quality simultaneously.
The core innovation of ROO lies in how training data is handled. Traditionally, recommendation systems treat each ‘impression’ (an item shown to a user) as a separate unit of training data. However, a single user request often generates multiple impressions, leading to extensive duplication of user-specific features across these impression-level samples. This redundancy wastes storage, network bandwidth, and computational resources.
ROO proposes a paradigm shift by treating a ‘user request’ as the fundamental unit of training data. This means that all impressions generated from a single user request are grouped into one training sample. This simple yet powerful change enables native feature deduplication at the data logging stage, significantly saving data storage. Furthermore, by eliminating redundant computations and communications across multiple impressions within a request, ROO allows for the development of more sophisticated neural network architectures, such as Generative Recommenders (GRs) and Hierarchical Sequential Transduction Units (HSTU), which can better capture complex user interest signals.
The ROO paradigm is a holistic co-design effort, integrating changes across data formats, infrastructure, and model architectures. The new request-level training data format distinctly separates ‘request-only’ (RO) data, which contains user features, from ‘non-request-only’ (NRO) data, which contains item features. This clean separation ensures that user sequence tensors are processed only once per request, drastically reducing redundant computation of user-side features.
This optimization has profound implications for system efficiency. For instance, user-side feature embedding lookups and associated communication overhead are reduced to just once per request-level sample. This leads to substantial increases in training throughput, with improvements ranging from 48% to 570% for retrieval and early-stage ranking models, and 32% to 100% for late-stage ranking models, even without modifying existing model architectures. The reduction in GPU training costs has made previously computationally prohibitive modeling technologies, like GRs, feasible for production use.
Beyond efficiency, ROO also enables significant model quality improvements. By amortizing the computational cost of processing user features, ROO allows for the integration of more complex user-side architectures like UserArch, which efficiently processes and integrates user-side features by compressing their dimensionality. For sequential modeling, ROO dramatically reduces the computational cost for self-attention based architectures, leading to substantial savings and enabling models like HSTU to capture nuanced user interests and interactions more effectively.
Also Read:
- Optimizing Top-K Ranking in Recommender Systems with SoftmaxLoss@K
- Enhancing Conversational Recommender Systems with Smart Data Augmentation
The effectiveness of ROO has been validated through its deployment to three major recommendation products, each serving billions of active users. The ROO data format has increased training sample volumes by 43% to 150% using the same storage capacity. Offline and online A/B tests have consistently shown enhanced metrics, including improvements in normalized entropy, recall, consumption, engagement, and topline metrics across various ranking and retrieval stages. This work offers practical and scalable solutions for engineers building large-scale, efficient, and effective recommendation systems. You can read the full paper here.


