TLDR: The Agentic Design Review System (Agentic-DRS) is a novel AI framework that uses multiple specialized agents, orchestrated by a meta-agent, to evaluate graphic designs. It enhances AI’s design understanding through a graph-based exemplar selection (GRAD) and structured design descriptions (SDD). The system provides comprehensive scores and actionable feedback, outperforming existing methods, and is evaluated using the new DRS-BENCH benchmark. This work aims to empower novice designers and improve generative AI design tools.
Evaluating graphic designs, whether they are flyers, posters, or invitation cards, is a complex task. It involves looking at many different aspects like how elements are aligned, the overall composition, aesthetic appeal, and color choices. Traditionally, this process relies on feedback from multiple human experts, which can be subjective and time-consuming. With the increasing popularity of do-it-yourself design tools and AI-generated designs, there’s a growing need for automated systems that can provide objective and actionable feedback.
Current methods for design evaluation often fall short. Heuristic approaches, which use predefined rules, struggle to capture the overall harmony of a design. Learning-based methods require extensive datasets and can treat evaluation as a simple scoring task, missing the nuanced feedback designers need. Even advanced Multi-modal Large Language Models (MLLMs) like GPT-4o, while showing promise in correlating with human judgment, often lack a deep, inherent understanding of design principles and struggle to provide comprehensive, actionable critiques across various dimensions.
Introducing the Agentic Design Review System (Agentic-DRS)
Researchers from Adobe Research have proposed a novel solution called the Agentic Design Review System (Agentic-DRS). This system is inspired by the human peer-review process, where multiple experts with diverse specializations collaboratively analyze a subject. In Agentic-DRS, multiple AI agents work together to analyze a design, all orchestrated by a central ‘meta-agent’. This framework aims to provide a holistic evaluation and generate actionable feedback for designers.
How Agentic-DRS Becomes ‘Design-Aware’
A key challenge for AI in design evaluation is making it truly ‘design-aware’. Agentic-DRS addresses this with two main innovations:
- GRAD (Graph-based Design Exemplar Selection): Unlike traditional methods that might just look at overall image similarity, GRAD creates a graph representation of a design. This graph captures not only the semantic meaning of elements (like text or images) but also their spatial and structural relationships (e.g., how close elements are, their alignment, and grouping). When evaluating a new design, GRAD intelligently retrieves the most relevant example designs from a library by matching these structural and semantic relationships. This ensures that the AI agents have a rich, context-aware understanding of what makes a good or bad design.
- SDD (Structured Design Description): For each input design, the system generates a detailed textual description. This description includes information about design elements (images, text, icons) and their hierarchical structure, often with bounding box coordinates. For example, it might describe a title at the top, an image below it, and then a block of text. This structured description helps anchor the MLLM’s responses, combining visual perception with explicit structural and semantic details. It makes the AI’s understanding of design attributes more robust and reduces the chances of it ‘hallucinating’ or providing irrelevant feedback.
The Collaborative Review Process
The core of Agentic-DRS is its multi-agent architecture, which operates in three phases:
- Planning: The meta-agent acts as a router. It analyzes the input design and decides which specialized ‘static’ and ‘dynamic’ agents are needed for the review. Static agents are predefined to assess universal design principles like alignment, overlap, and spacing. Dynamic agents, on the other hand, are more flexible; they are spawned by the meta-agent to evaluate context-dependent attributes such as stylistic coherence, grouping, or the semantic effectiveness of communication, which can vary greatly between designs.
- Reviewing: Once activated, each static and dynamic agent independently assesses the design based on its assigned attributes. They provide both quantitative ratings (scores) and qualitative feedback (actionable suggestions).
- Summarization: Finally, the meta-agent collects all the individual ratings and feedback from the static and dynamic agents. It then consolidates this information, removes redundancies, resolves inconsistencies, and generates a unified, holistic evaluation report with actionable feedback for design improvements.
DRS-BENCH: A New Benchmark for Design Evaluation
To thoroughly evaluate Agentic-DRS and provide a standardized way to compare future design evaluation systems, the researchers introduced DRS-BENCH. This comprehensive benchmark includes 15 detailed design attribute definitions and four diverse datasets: GDE, Afixa, Infographic, and an Internal Design Dataset (IDD). These datasets cover various design types and include both continuous (1-10 scale) and discrete (yes/no) labels for design attributes. DRS-BENCH also defines new evaluation metrics for both attribute assessment and feedback quality, allowing for a more objective and detailed comparison of different evaluation methods.
Also Read:
- Assessing AI Assistants: A New Automated Framework for Multi-modal Evaluation
- AI Learns to Design Game Levels with a Human Touch
Promising Results and Future Directions
Extensive experiments show that Agentic-DRS significantly outperforms state-of-the-art baselines, including single-agent MLLM systems and heuristic methods, across all metrics and datasets in DRS-BENCH. The integration of GRAD and SDD proved crucial, enhancing the system’s ability to understand designs and generate more accurate and actionable feedback. For instance, the system can pinpoint inconsistent spacing in specific sections of a design or correctly assess stylistic aspects like elegance and minimality.
This work represents a significant step forward in automated graphic design evaluation. The Agentic-DRS framework, detailed in the research paper available at arXiv:2508.10745, not only empowers novice designers by providing constructive critiques but also offers a robust yardstick for evaluating designs generated by advanced generative AI models. A key next step for this research is to extend the framework to automatically apply the generated feedback to the input design, moving towards a self-improving system for graphic design creation.


