TLDR: MMCircuitEval is the first comprehensive multimodal benchmark designed to evaluate large language models (LLMs) in electronic circuit design. It comprises 3,614 expert-reviewed question-answer pairs covering digital and analog circuits across all EDA stages, from general knowledge to front-end and back-end design. Evaluations reveal that current LLMs struggle significantly with back-end design and complex computations, highlighting a critical need for specialized training data and improved modeling approaches to advance AI integration into real-world circuit design.
The world of artificial intelligence, particularly large language models (LLMs), is rapidly expanding, showing immense potential across various industries. One such area where LLMs are beginning to make a significant impact is Electronic Design Automation (EDA), the software tools used to design electronic systems like microchips. However, truly understanding how well these advanced AI models perform in the intricate field of circuit design has been a challenge due to the limited scope of existing evaluation methods.
To address this critical gap, a team of researchers from institutions including The Chinese University of Hong Kong, Nanjing University, and Peking University, along with the National Center of Technology Innovation for EDA, has introduced a groundbreaking new benchmark called MMCircuitEval. This benchmark is the first of its kind, specifically designed to comprehensively evaluate multimodal large language models (MLLMs) across the diverse and complex tasks involved in EDA.
What is MMCircuitEval?
MMCircuitEval is a meticulously curated collection of 3,614 question-answer (QA) pairs. These questions cover a wide spectrum of circuit design, encompassing both digital and analog circuits, and span crucial stages of the EDA workflow. This includes general knowledge about circuits, design specifications, front-end design (like writing code for circuit behavior), and back-end design (physical layout and connections on a chip).
The questions in MMCircuitEval are not just theoretical; they are derived from a variety of reliable sources, including textbooks, technical question banks, product datasheets, and even real-world documentation. Each question and its corresponding answer undergo a rigorous review process by domain experts to ensure accuracy and relevance to practical circuit design challenges.
What makes MMCircuitEval unique is its detailed categorization system. Questions are classified by the type of circuit (digital or analog), the specific design stage they relate to, the abilities they test in the LLM (such as knowledge recall, comprehension, logical reasoning, or numerical computation), and their difficulty level (easy, medium, or hard). This granular approach allows for a much deeper analysis of an AI model’s strengths and weaknesses in circuit design.
Key Findings from Evaluations
Extensive evaluations using MMCircuitEval have revealed significant performance differences among existing LLMs. While some models show promise, many struggle to achieve satisfactory accuracy in circuit-focused question answering. A particularly notable finding is the performance gap in back-end design tasks and complex computations. This suggests that current LLMs often lack sufficient specialized training data for these highly specific and intricate aspects of circuit design.
For instance, even top-performing models like GPT-4v, while generally strong, showed a considerable drop in accuracy when tackling back-end design questions. This highlights that while LLMs excel in general knowledge and comprehension, applying electronic rules and design methodologies to complex, context-dependent problems remains a significant hurdle.
The benchmark also shed light on how LLMs handle different types of data. While multimodal LLMs are designed to process visual information, many models actually performed worse on questions involving circuit-related images compared to text-only questions. This indicates that the visual encoders in these models may not be adequately trained on circuit-specific visual data, sometimes even misleading the AI. Interestingly, the GPT model family, which often converts images to strings, showed better performance on multimodal questions, suggesting that this approach can be effective when paired with a powerful language model backbone.
Also Read:
- MCIF: A New Benchmark for Multilingual Multimodal AI Understanding
- Assessing Machine Health: A New Approach to Evaluating Anomaly Detection Systems with AI-Generated Sounds
Looking Ahead
MMCircuitEval serves as a foundational resource for advancing the capabilities of MLLMs in EDA. The insights gained from this benchmark are crucial for guiding the development of more targeted training datasets and modeling approaches. The research paper discusses potential solutions, such as the need for more high-quality, open-source circuit-related data for training, and the effectiveness of test-time techniques like Chain-of-Thought (CoT) reasoning, which encourages models to break down complex problems into smaller steps.
The benchmark is openly available for researchers and developers to use, fostering collaboration between the AI and hardware communities. You can find the benchmark and learn more about this research at the project’s GitHub repository.
In conclusion, MMCircuitEval is a vital step forward in evaluating and improving the performance of AI in the complex domain of electronic circuit design, paving the way for more automated and enhanced EDA workflows in the future.


