AI Agents Reshaping Conceptual Engineering Design with Structured Language Models

TLDR: This research evaluates multi-agent (MAS) and two-agent (2AS) systems powered by Large Language Models (LLMs) for early-stage engineering design. Using a Design-State Graph (DSG) to represent design knowledge, the study found that while reasoning-distilled LLMs and MAS improve design granularity and workflow completion for a solar-powered water filtration system, challenges remain in comprehensive requirement coverage and generating physics-correct, production-ready simulation code. The MAS produced more detailed designs but was slower, while the 2AS was faster but less granular.

Engineering design, especially in its early stages, is a highly complex and iterative process. It involves defining problems, exploring concepts, and integrating various systems, often requiring designers to manage evolving requirements and balance conflicting constraints. While computational tools and even recent generative AI models have helped, they often fall short in orchestrating the entire design process from initial requirements to final implementation.

A new research paper explores a promising approach: using agentic Large Language Models (LLMs) for conceptual systems engineering and design. Unlike traditional LLMs that act as static assistants, agentic LLMs are designed to be autonomous, capable of planning, remembering information, using external tools, and executing actions to achieve specific goals.

The Design-State Graph: A Blueprint for AI Design

Central to this research is the introduction of the Design-State Graph (DSG). Imagine a dynamic blueprint that bundles all aspects of a design – from initial requirements to physical components and even executable Python code for simulations – into interconnected nodes. This JSON-serializable representation allows the AI agents to understand, build, and refine the design iteratively, making it easier for them to interact with external systems and tools.

Two Approaches: Multi-Agent vs. Two-Agent Systems

The researchers evaluated two distinct AI system configurations:

Multi-Agent System (MAS): This is a sophisticated setup featuring nine specialized AI agents. Each agent has a unique role, such as an Extractor for gathering requirements, a Generator for proposing design solutions, a Coder for refining simulation scripts, a Reflector for critiquing designs, and a Supervisor to oversee the entire workflow. This structured collaboration aims to manage complex design tasks more effectively.
Two-Agent System (2AS): As a simpler baseline, this system consists of just two agents: a Generator and a Reflector. They work in a continuous feedback loop, with the Reflector critiquing the Generator’s proposals and guiding further iterations. This setup helps determine if the complexity of the MAS is truly necessary for superior performance.

Both systems were tasked with designing a solar-powered water filtration system, based on a detailed set of technical specifications. The experiments involved varying the underlying LLM (Llama 3.3 70B versus the more reasoning-focused DeepSeek R1 70B), different levels of model creativity (sampling temperatures), and the two agent configurations.

Key Findings and Insights

The study yielded several important insights:

Robustness: Both the MAS and 2AS consistently produced valid JSON outputs and correctly identified physical components (embodiments) within the DSG, demonstrating the reliability of their structured output capabilities.
Reasoning Power: The DeepSeek R1 70B model, which is fine-tuned for reasoning, generally outperformed Llama 3.3 70B. It was more reliable in completing design workflows and, when used with the MAS, generated more detailed design graphs with more nodes, suggesting a finer breakdown of the system.
Granularity vs. Speed: The MAS, with its multi-agent orchestration, produced more detailed DSGs (around 5-6 nodes) but took significantly longer to complete a design (hundreds of seconds). In contrast, the simpler 2AS was much faster (under 40 seconds) but often produced less detailed designs, sometimes with only a single node.
Code Quality: While the 2AS sometimes achieved 100% code executability in specific settings, the MAS averaged below 50%. However, the MAS, particularly with the Coder agent, generated more comprehensive Python scripts for simulations, including features like command-line interfaces, logging, and unit tests. The 2AS, lacking a dedicated Coder, produced simpler, single-function code stubs.
Requirement Coverage: A significant challenge for both systems was comprehensively mapping user-specified requirements into the DSG, with coverage peaking at only 20%. This highlights a persistent gap in how LLMs translate high-level needs into detailed design elements.

The research concludes that while specialized LLM agents and structured multi-agent architectures show great promise in deepening design exploration and improving workflow completion, there are still fundamental limitations. The generated simulation scripts, though runnable, often contained physics errors and unit inconsistencies, indicating a need for more rigorous mathematical and domain-grounded reasoning from the LLMs.

Also Read:

The Path Forward

Future work aims to enhance these AI design assistants by integrating more tools like web search and interactive Python environments, fine-tuning agents specifically for simulation code generation, and developing stricter validation methods to ensure physical accuracy and requirement satisfaction. The researchers also emphasize the importance of human oversight and transparent AI decision-making to prevent potential issues like the deskilling of early-career engineers as these powerful tools evolve.

For a deeper dive into the methodology and results, you can read the full research paper: Agentic Large Language Models for Conceptual Systems Engineering and Design.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI Agents Reshaping Conceptual Engineering Design with Structured Language Models

The Design-State Graph: A Blueprint for AI Design

Two Approaches: Multi-Agent vs. Two-Agent Systems

Key Findings and Insights

The Path Forward

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates