TLDR: Tesserae is a novel GPU cluster scheduler for deep learning workloads that addresses the limitations of existing placement policies. It formulates job placement and migration as graph matching problems, enabling scalable and efficient solutions. Tesserae introduces new algorithms for minimizing job migrations and maximizing job packing throughput, including optimizing parallelism strategies for large language models. Experimental results show significant improvements in job completion time, makespan, and fairness, while demonstrating strong adaptability and scalability for large-scale deep learning clusters.
Deep learning (DL) models are at the heart of modern data centers, and ensuring their efficient training is a top priority. A critical aspect of this efficiency lies in how jobs are placed on powerful GPU clusters. Traditionally, schedulers have relied on either simple, ad-hoc rules or complex optimization problems to decide where to run these demanding workloads. However, both approaches have significant drawbacks: ad-hoc rules often lead to suboptimal performance, while complex optimizations struggle to scale as clusters grow larger and the number of jobs increases.
Enter Tesserae, a novel approach designed to overcome these limitations. Researchers at the University of Wisconsin-Madison, Song Bian, Saurabh Agarwal, Md. Tareq Mahmood, and Shivaram Venkataraman, developed Tesserae based on a key insight: many deep learning job placement challenges can be elegantly framed as graph matching problems. This mathematical formulation allows for efficient solutions using well-established algorithms, leading to a more scalable and effective GPU cluster scheduler.
Minimizing Job Migrations for Smoother Operations
One of the hidden costs in GPU cluster management is job migration. When a job moves from one set of GPUs to another between scheduling rounds, it incurs overhead that can slow down overall progress. Tesserae introduces an innovative migration algorithm that significantly reduces these disruptions. By modeling the current and future placement plans as a graph, Tesserae can identify the optimal way to reassign jobs to GPUs, minimizing unnecessary movements. This intelligent approach helps maintain high throughput and reduces the time jobs spend waiting or relocating.
Efficient Packing for Maximized GPU Utilization
Another core component of Tesserae is its efficient job packing policy. Packing involves running multiple deep learning jobs concurrently on the same GPUs to maximize resource utilization. Tesserae transforms this into a maximum weighted bipartite graph matching problem. In this graph, jobs already running are matched with jobs waiting to be placed, and the ‘weight’ of a potential match represents the combined throughput (performance) of those jobs when packed together. By solving this problem, Tesserae ensures that jobs are packed in a way that maximizes the total cluster throughput.
A notable feature of Tesserae’s packing policy is its ability to consider different parallelism strategies for large language models. These models can be trained using various techniques (like data parallelism or pipeline parallelism), and the choice can significantly impact performance, especially when packed with other jobs. Tesserae intelligently selects the best parallelism strategy to further boost combined throughput and prevent issues like out-of-memory errors.
Also Read:
- Optimizing LLM Serving with Predictive Scheduling: Introducing Block
- Frontier: Bridging the Simulation Gap for Modern LLM Inference
Real-World Impact and Scalability
The effectiveness of Tesserae has been demonstrated through extensive experiments on both physical GPU clusters and large-scale simulations. Compared to existing schedulers like Tiresias and Gavel, Tesserae has shown remarkable improvements, reducing average Job Completion Time (JCT) by up to 1.62 times and Makespan (the total time to complete all jobs) by up to 1.15 times. It also improves fairness metrics, ensuring a more equitable distribution of resources among jobs.
Crucially, Tesserae is designed for adaptability and scalability. It can seamlessly adjust to different hardware configurations, such as varying GPU types, without requiring manual tuning. Its modular design allows it to be integrated with various existing scheduling policies, making it a versatile solution for diverse cluster environments. Furthermore, Tesserae proves highly scalable, capable of making placement decisions for clusters with thousands of GPUs and thousands of active jobs within seconds, a significant improvement over prior optimization-based methods that struggle with increasing scale.
This research marks a significant step forward in deep learning cluster scheduling, offering a principled, efficient, and scalable framework for managing complex workloads. For more in-depth information, you can read the full research paper here.


