New Multi-Agent AI System Enhances Code Grading in Education

TLDR: AGACCI is a novel multi-agent AI system designed to improve the grading of coding assignments in educational contexts. By distributing evaluation tasks among specialized AI agents (e.g., for code execution, visualization, and interpretation), AGACCI provides more accurate, consistent, and pedagogically valuable feedback compared to single AI models. The system was validated on graduate-level coding assignments, demonstrating significant performance improvements and highlighting its potential for scalable, context-aware educational assessment.

A new multi-agent system named AGACCI is set to transform how coding assignments are assessed in educational environments. This innovative framework addresses the limitations of existing AI-assisted grading tools, which often struggle with the complexity of programming tasks that require both quantitative and qualitative evaluation.

Traditional AI models can sometimes provide feedback that is either too generic, inconsistent, or fails to align with specific grading criteria. AGACCI tackles these challenges by distributing specialized evaluation roles across multiple collaborative AI agents. Each agent focuses on a distinct aspect of the assessment process, leading to more accurate, interpretable, and consistent feedback.

The AGACCI system operates through a structured pipeline of modular agents. For instance, the Rubric Interpreter parses grading guidelines into actionable criteria, while the Submission Analyzer reviews the student’s work holistically. Dedicated agents like the Execution Evaluator check code functionality, the Result Evaluator assesses quantitative performance, and the Visualization Evaluator examines the clarity of visual outputs. The Interpretation Evaluator delves into the student’s reasoning and analytical depth. A crucial component, the Meta Evaluator, ensures internal consistency across all agents’ findings, flagging any contradictions. Finally, the Final Judge aggregates these evaluations to make a conclusive decision and generate comprehensive feedback, which is then summarized by the Summarizer agent for clear delivery to the student.

To validate its effectiveness, AGACCI was tested on a dataset of 360 graduate-level code-based assignments. These submissions were also annotated by human domain experts, providing a robust benchmark. The experimental results demonstrated that AGACCI significantly outperformed a single GPT-based baseline in terms of rubric and feedback accuracy, relevance, consistency, and coherence. This indicates that AGACCI’s assessments are more aligned with expert judgments, more stable, and easier for students to understand.

The system showed particularly strong performance gains in Natural Language Processing (NLP) tasks, which often demand nuanced interpretation and structured reasoning. AGACCI leverages GPT-4o mini as its core language model, chosen for its balance of strong reasoning capabilities and operational efficiency, making it a practical choice for scalable classroom integration.

Also Read:

This research highlights the potential of multi-agent systems to augment teacher capacity and improve the scalability of formative feedback in real-world educational settings. While AGACCI marks a significant step forward, future work will explore its ability to handle ambiguous rubric items and its applicability to more open-ended, text-based assignments. For more detailed information, you can refer to the full research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New Multi-Agent AI System Enhances Code Grading in Education

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Geninfinity Education Honored with 2025 Global Recognition Award for Pioneering AI-Powered Decentralized Learning

Artificial Intelligence Revolutionizes Educator Development and Personalized Learning, New Studies Reveal

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates