DALI-PD: Accelerating ML in Chip Design with Synthetic Layout Heatmaps

TLDR: DALI-PD is a new framework that uses diffusion models to quickly generate diverse and realistic synthetic layout heatmaps for machine learning in physical chip design. It addresses the scarcity of high-quality training data by creating over 20,000 data points in hours, which would otherwise take weeks. The synthetic data improves ML model accuracy for tasks like IR drop and congestion prediction, especially when real-world data is limited, by leveraging transfer learning from satellite imagery for enhanced realism and stability.

Machine learning (ML) has shown great promise in various aspects of physical design (PD) tasks, which are crucial steps in creating integrated circuits. However, a significant hurdle for ML models in this field is the lack of high-quality, large-scale training datasets. Generating these datasets is often very expensive computationally and is restricted by intellectual property concerns. Existing public datasets are typically static, slow to update, and limited in their diversity.

Addressing these challenges, researchers Bing-Yue Wu and Vidya A. Chhabria from Arizona State University have introduced DALI-PD. This innovative framework is designed to generate synthetic layout heatmaps, aiming to accelerate ML research in physical design. DALI-PD utilizes a diffusion model, a type of generative AI, to create a wide variety of layout heatmaps quickly, often in just seconds. These heatmaps represent critical aspects of chip design, including power distribution, IR (voltage) drop, congestion, macro placement, and cell density.

Using DALI-PD, the researchers successfully created a massive dataset containing over 20,000 different layout configurations. These configurations vary in macro counts and placements, and the generated heatmaps closely resemble real-world layouts. Crucially, training ML models with these synthetic heatmaps has been shown to improve accuracy in downstream tasks like predicting IR drop or congestion.

The Need for Synthetic Data in Physical Design

Unlike fields such as computer vision or natural language processing, where vast labeled datasets are readily available, PD datasets are often proprietary, lack diversity, and are costly to produce. This scarcity makes it difficult to develop ML models that can generalize well across different chip designs. While some benchmarks and ML-specific datasets exist, they are often static and require substantial manual and computational effort for updates. Industry datasets, to protect intellectual property, are frequently obfuscated, which can distort data and reduce their usefulness.

How DALI-PD Works

DALI-PD operates through a two-stage generative process. First, it uses a Variational Autoencoder (VAE) to learn how to represent layouts in a lower-dimensional space, which helps reduce the computational load for the next stage. Second, a UNet-based diffusion model generates high-fidelity heatmaps. The framework is guided by specific circuit layout conditions, such as clock period, layout dimensions (height and width), utilization, and the bounding boxes of macros. This allows DALI-PD to generate diverse layouts with varying characteristics.

A key innovation in DALI-PD is its use of transfer learning. The diffusion model is pre-trained on satellite images, leveraging the visual similarities between chip layouts and urban satellite imagery. This pre-training significantly enhances the realism and stability of the generated heatmaps, allowing the model to converge faster during training compared to models trained from scratch.

During the generation process, DALI-PD starts with random noise and iteratively refines it using the diffusion model, guided by the circuit parameters. The VAE then decodes this refined representation into the final six categories of layout heatmaps. The framework also includes a post-processing and sanity checker module to ensure the quality of the generated layouts, discarding any that don’t meet design constraints.

Evaluating DALI-PD’s Effectiveness

The researchers evaluated DALI-PD’s generated heatmaps using both computer vision metrics and by testing their utility in downstream ML tasks. Statistically, DALI-PD’s heatmaps showed similar distributions to real CircuitNet data, with low Fréchet Inception Distance (FID) scores, indicating high realism. The framework also demonstrated superior stability and required significantly fewer iterations to generate valid heatmaps compared to a baseline model not using satellite image pre-training.

For diversity, DALI-PD generated a dataset with low pairwise Structured Similarity Index Measure (SSIM) scores, confirming that the generated maps are distinct from each other, unlike some existing datasets that show higher similarity. The entire DALI-PD dataset of over 23,000 samples was generated in just 4 hours, a stark contrast to the weeks it can take to generate comparable real datasets.

In downstream ML tasks, models trained solely on the DALI-PD synthetic dataset achieved comparable, though slightly lower, accuracy for IR drop and RUDY (routing demand) prediction compared to models trained on real CircuitNet data. More importantly, when real-circuit heatmaps are limited, models pre-trained on DALI-PD’s synthetic data significantly outperformed those trained only on small amounts of real data. This highlights the value of synthetic data in scenarios where real data is scarce.

Also Read:

Conclusion

DALI-PD represents a significant advancement in generating high-quality, diverse, and realistic synthetic layout heatmaps for machine learning in physical design. Its ability to generate data rapidly and at scale, while maintaining fidelity to real-world characteristics, promises to accelerate research and development in ML-driven chip design, especially when access to proprietary real-world data is limited. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

DALI-PD: Accelerating ML in Chip Design with Synthetic Layout Heatmaps

The Need for Synthetic Data in Physical Design

How DALI-PD Works

Evaluating DALI-PD’s Effectiveness

Conclusion

Gen AI News and Updates

Generative AI Powers Next-Gen Autonomous Emergency Response

C3-Diff: Enhancing Spatial Gene Expression Maps with AI and Histology

Enhancing Text Legibility in AI-Generated Videos with Synthetic Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates