TLDR: Pinterest has developed a novel multi-faceted pretraining scheme for large embedding tables to significantly improve its ads ranking models. By combining User-Pin Contrastive Learning and Heterogeneous Knowledge Graph Embedding, the company achieved substantial gains in Click-Through Rate (CTR) and Conversion Rate (CVR). A new CPU-GPU hybrid serving infrastructure was also implemented to overcome memory limitations and ensure scalability, resulting in improved online performance with no increase in serving latency.
In the world of digital advertising, especially on platforms like Pinterest, the ability to accurately predict what users will click on (Click-Through Rate or CTR) and what actions they will take after clicking (Conversion Rate or CVR) is crucial. Modern recommendation systems heavily rely on what are known as ‘large embedding tables’ to capture the complex interactions between various elements like users, pins, and items. These tables essentially translate high-dimensional data into more manageable, lower-dimensional vectors, making it easier for models to learn and make predictions.
Pinterest, a leading platform for inspiration and shopping, faced unique challenges when integrating these large embedding tables into its ads ranking models. Beyond common issues like data sparsity and scalability, initial attempts to train these tables from scratch yielded no significant performance improvements. This indicated a need for a more sophisticated approach to truly leverage the power of these large tables.
To overcome this hurdle, Pinterest researchers introduced a novel ‘multi-faceted pretraining scheme’. This innovative approach involves two key pretraining methods designed to enrich the embedding tables with valuable supplementary information before fine-tuning them within the main ads ranking models. The first method is ‘User-Pin Contrastive Learning’. This technique focuses on independently capturing interactions between users and pins, free from interference from other features. It uses a vast amount of historical engagement and conversion data to pretrain user and pin embedding tables, employing a contrastive loss function to learn meaningful representations.
The second pretraining method involves ‘Large-scale Heterogeneous Knowledge Graph Embedding’. This approach constructs a massive graph incorporating both onsite engagement and offsite conversion data. Node entity embeddings are trained by predicting the existence of connections between different entities. This method is distinct from other graph-based embedding techniques used at Pinterest, potentially capturing different, complementary information that enhances the overall understanding of user and item relationships.
The results of this multi-faceted pretraining were significant. Offline experiments showed a substantial performance lift in both CTR and CVR prediction models, with the pretraining technique delivering more than a four-fold greater relative performance gain compared to training from scratch. An analysis further revealed that both pretraining strategies provided independent and additive improvements, demonstrating their orthogonal benefits.
Scaling these embedding tables, which can contain hundreds of millions of rows, also presented considerable technical challenges for both training and serving. For training, Pinterest utilized distributed model-parallel training to shard the large embedding tables across multiple GPUs. For serving, where memory constraints are even tighter, a scalable CPU-GPU hybrid serving infrastructure was designed. This innovative framework hosts the large embedding tables on external CPU clusters while the main ranking model operates on GPUs. This design allows the embedding tables to scale independently of GPU capacity and helps overcome GPU memory limitations.
To further enhance efficiency, Pinterest applied post-training INT4 quantization, compressing the embedding tables to approximately 40% of their original size. Remarkably, this compression not only maintained but slightly improved model performance, possibly due to the quantization acting as a form of regularization that mitigates overfitting on sparse data.
Online experiments confirmed the success of this approach. The large embedding table CTR model demonstrated significant improvements across core online metrics, including a 1.34% reduction in online Cost Per Click (CPC) and a 2.60% increase in Click-Through Rate (CTR). Crucially, these gains were achieved with neutral end-to-end serving latency and only a negligible rise in serving cost, thanks to the efficient hybrid serving infrastructure.
Further studies revealed important insights: fine-tuning the pretrained embedding tables within downstream tasks is essential for optimal performance, as freezing them led to a performance decrease. Additionally, the freshness of the pretraining data is vital; older pretraining data (staleness) significantly reduced the performance gains, highlighting the importance of timely updates.
Also Read:
- Enhancing Recommendations with Semantic Item Graphs and Noise Robustness
- Unlocking Insights in Complex Networks: A New Approach to Heterogeneous Graphs
In conclusion, this work by Pinterest provides a robust and scalable foundation for high-performing ad recommendation systems capable of handling the ever-increasing data volume and complexity of a platform like Pinterest. While significant accuracy improvements have been achieved, future work will explore further optimizations, such as shared embedding tables, more efficient sharding and caching strategies, and adaptive embedding structures, to continue minimizing latency and maintaining strict version synchronization. You can read the full research paper here: Multi-Faceted Large Embedding Tables for Pinterest Ads Ranking.


