TLDR: DALI-PD is a new framework that uses diffusion models to quickly generate diverse and realistic synthetic layout heatmaps for machine learning in physical chip design. It addresses the scarcity of high-quality training data by creating over 20,000 data points in hours, which would otherwise take weeks. The synthetic data improves ML model accuracy for tasks like IR drop and congestion prediction, especially when real-world data is limited, by leveraging transfer learning from satellite imagery for enhanced realism and stability.
Machine learning (ML) has shown great promise in various aspects of physical design (PD) tasks, which are crucial steps in creating integrated circuits. However, a significant hurdle for ML models in this field is the lack of high-quality, large-scale training datasets. Generating these datasets is often very expensive computationally and is restricted by intellectual property concerns. Existing public datasets are typically static, slow to update, and limited in their diversity.
Addressing these challenges, researchers Bing-Yue Wu and Vidya A. Chhabria from Arizona State University have introduced DALI-PD. This innovative framework is designed to generate synthetic layout heatmaps, aiming to accelerate ML research in physical design. DALI-PD utilizes a diffusion model, a type of generative AI, to create a wide variety of layout heatmaps quickly, often in just seconds. These heatmaps represent critical aspects of chip design, including power distribution, IR (voltage) drop, congestion, macro placement, and cell density.
Using DALI-PD, the researchers successfully created a massive dataset containing over 20,000 different layout configurations. These configurations vary in macro counts and placements, and the generated heatmaps closely resemble real-world layouts. Crucially, training ML models with these synthetic heatmaps has been shown to improve accuracy in downstream tasks like predicting IR drop or congestion.
The Need for Synthetic Data in Physical Design
Unlike fields such as computer vision or natural language processing, where vast labeled datasets are readily available, PD datasets are often proprietary, lack diversity, and are costly to produce. This scarcity makes it difficult to develop ML models that can generalize well across different chip designs. While some benchmarks and ML-specific datasets exist, they are often static and require substantial manual and computational effort for updates. Industry datasets, to protect intellectual property, are frequently obfuscated, which can distort data and reduce their usefulness.
How DALI-PD Works
DALI-PD operates through a two-stage generative process. First, it uses a Variational Autoencoder (VAE) to learn how to represent layouts in a lower-dimensional space, which helps reduce the computational load for the next stage. Second, a UNet-based diffusion model generates high-fidelity heatmaps. The framework is guided by specific circuit layout conditions, such as clock period, layout dimensions (height and width), utilization, and the bounding boxes of macros. This allows DALI-PD to generate diverse layouts with varying characteristics.
A key innovation in DALI-PD is its use of transfer learning. The diffusion model is pre-trained on satellite images, leveraging the visual similarities between chip layouts and urban satellite imagery. This pre-training significantly enhances the realism and stability of the generated heatmaps, allowing the model to converge faster during training compared to models trained from scratch.
During the generation process, DALI-PD starts with random noise and iteratively refines it using the diffusion model, guided by the circuit parameters. The VAE then decodes this refined representation into the final six categories of layout heatmaps. The framework also includes a post-processing and sanity checker module to ensure the quality of the generated layouts, discarding any that don’t meet design constraints.
Evaluating DALI-PD’s Effectiveness
The researchers evaluated DALI-PD’s generated heatmaps using both computer vision metrics and by testing their utility in downstream ML tasks. Statistically, DALI-PD’s heatmaps showed similar distributions to real CircuitNet data, with low Fréchet Inception Distance (FID) scores, indicating high realism. The framework also demonstrated superior stability and required significantly fewer iterations to generate valid heatmaps compared to a baseline model not using satellite image pre-training.
For diversity, DALI-PD generated a dataset with low pairwise Structured Similarity Index Measure (SSIM) scores, confirming that the generated maps are distinct from each other, unlike some existing datasets that show higher similarity. The entire DALI-PD dataset of over 23,000 samples was generated in just 4 hours, a stark contrast to the weeks it can take to generate comparable real datasets.
In downstream ML tasks, models trained solely on the DALI-PD synthetic dataset achieved comparable, though slightly lower, accuracy for IR drop and RUDY (routing demand) prediction compared to models trained on real CircuitNet data. More importantly, when real-circuit heatmaps are limited, models pre-trained on DALI-PD’s synthetic data significantly outperformed those trained only on small amounts of real data. This highlights the value of synthetic data in scenarios where real data is scarce.
Also Read:
- Boosting Generative AI: A Framework for Smarter Diffusion Model Training
- PanoDiff-SR: A New Method for Generating Realistic Dental X-rays
Conclusion
DALI-PD represents a significant advancement in generating high-quality, diverse, and realistic synthetic layout heatmaps for machine learning in physical design. Its ability to generate data rapidly and at scale, while maintaining fidelity to real-world characteristics, promises to accelerate research and development in ML-driven chip design, especially when access to proprietary real-world data is limited. For more technical details, you can refer to the full research paper here.


