TLDR: AGACCI is a novel multi-agent AI system designed to improve the grading of coding assignments in educational contexts. By distributing evaluation tasks among specialized AI agents (e.g., for code execution, visualization, and interpretation), AGACCI provides more accurate, consistent, and pedagogically valuable feedback compared to single AI models. The system was validated on graduate-level coding assignments, demonstrating significant performance improvements and highlighting its potential for scalable, context-aware educational assessment.
A new multi-agent system named AGACCI is set to transform how coding assignments are assessed in educational environments. This innovative framework addresses the limitations of existing AI-assisted grading tools, which often struggle with the complexity of programming tasks that require both quantitative and qualitative evaluation.
Traditional AI models can sometimes provide feedback that is either too generic, inconsistent, or fails to align with specific grading criteria. AGACCI tackles these challenges by distributing specialized evaluation roles across multiple collaborative AI agents. Each agent focuses on a distinct aspect of the assessment process, leading to more accurate, interpretable, and consistent feedback.
The AGACCI system operates through a structured pipeline of modular agents. For instance, the Rubric Interpreter parses grading guidelines into actionable criteria, while the Submission Analyzer reviews the student’s work holistically. Dedicated agents like the Execution Evaluator check code functionality, the Result Evaluator assesses quantitative performance, and the Visualization Evaluator examines the clarity of visual outputs. The Interpretation Evaluator delves into the student’s reasoning and analytical depth. A crucial component, the Meta Evaluator, ensures internal consistency across all agents’ findings, flagging any contradictions. Finally, the Final Judge aggregates these evaluations to make a conclusive decision and generate comprehensive feedback, which is then summarized by the Summarizer agent for clear delivery to the student.
To validate its effectiveness, AGACCI was tested on a dataset of 360 graduate-level code-based assignments. These submissions were also annotated by human domain experts, providing a robust benchmark. The experimental results demonstrated that AGACCI significantly outperformed a single GPT-based baseline in terms of rubric and feedback accuracy, relevance, consistency, and coherence. This indicates that AGACCI’s assessments are more aligned with expert judgments, more stable, and easier for students to understand.
The system showed particularly strong performance gains in Natural Language Processing (NLP) tasks, which often demand nuanced interpretation and structured reasoning. AGACCI leverages GPT-4o mini as its core language model, chosen for its balance of strong reasoning capabilities and operational efficiency, making it a practical choice for scalable classroom integration.
Also Read:
- AI Agents Collaborate to Create Scalable Conversational Learning Experiences
- CodeAgents: Boosting LLM Agent Performance and Efficiency with Codified Reasoning
This research highlights the potential of multi-agent systems to augment teacher capacity and improve the scalability of formative feedback in real-world educational settings. While AGACCI marks a significant step forward, future work will explore its ability to handle ambiguous rubric items and its applicability to more open-ended, text-based assignments. For more detailed information, you can refer to the full research paper.


