spot_img
HomeResearch & DevelopmentEnhancing Programming Education with Autograder+: An AI Framework for...

Enhancing Programming Education with Autograder+: An AI Framework for Rich Feedback and Visual Analytics

TLDR: Autograder+ is an AI framework that transforms programming autograding into a formative learning platform. It uses a fine-tuned Large Language Model (LLM) for generating rich, context-aware pedagogical feedback and contrastively learned embeddings for visualizing student code submissions. The system also features a “Prompt Pooling” mechanism, allowing instructors to dynamically customize feedback style. Empirical evaluations show strong semantic alignment with expert feedback, and the framework provides instructors with actionable insights into student learning patterns.

The landscape of programming education is rapidly expanding, yet traditional assessment methods often struggle to keep pace. Conventional autograders, while efficient, typically offer only a “pass/fail” status, acting as opaque systems that provide minimal insight into a student’s thought process or conceptual errors. This limitation leaves educators with a significant challenge: how to deliver meaningful, scalable feedback to a growing number of students.

Addressing this critical gap, researchers from the Indian Institute of Technology Bhilai have introduced Autograder+, a comprehensive and intelligent AI framework designed to transform autograding from a mere evaluative tool into a dynamic learning platform. This innovative system stands out with two core features: advanced feedback generation powered by a fine-tuned Large Language Model (LLM) and insightful visualization of student code submissions.

Intelligent Feedback Generation

At the heart of Autograder+’s feedback mechanism is a sophisticated LLM. This model is not just a generic AI; it undergoes domain-specific fine-tuning using a carefully curated dataset of student code and expert annotations. This specialized training ensures that the feedback generated is not only accurate but also pedagogically sound and deeply aware of the context of programming problems. Empirical evaluations, involving 600 student submissions across various programming challenges, showed that Autograder+ produced feedback with an average BERTScore F1 of 0.7658, indicating a strong semantic alignment with feedback written by human experts.

A unique aspect of the feedback system is the “Prompt Pooling” mechanism. This allows instructors to dynamically influence the LLM’s feedback style. Educators can create a repository of specialized prompts, each designed to focus the AI’s analysis on specific programming concepts, error types, or pedagogical strategies. When a student’s code is processed, the system identifies the most semantically relevant prompt and injects it into the LLM’s context, ensuring highly targeted and context-aware guidance. This flexibility means instructors can easily adapt the system’s behavior to different course levels or topics without complex technical adjustments.

Visualizing Student Learning

Beyond textual feedback, Autograder+ offers powerful visualization tools for instructors. To make these visualizations truly meaningful, the framework employs contrastively learned embeddings. These embeddings are trained on a large dataset of annotated submissions, organizing student solutions into a “performance-aware semantic space.” In this space, functionally similar approaches cluster together, allowing instructors to quickly identify groups of correct, partially correct, or incorrect solutions. This visual representation, often presented as interactive UMAP scatter plots, transforms raw submission data into actionable pedagogical insights, helping educators spot recurring error patterns, common strategies, and outlier cases at a glance.

The Autograder+ Architecture

The framework operates as a modular, multi-stage pipeline. It begins with a Code Ingestor, handling various submission formats and linking them to assignment configurations. Next, a Static Analysis Engine quickly checks for syntax errors, coding style violations, and structural anti-patterns without executing the code. Structurally sound code then moves to the Dynamic Execution Engine, which runs each test case in an isolated Docker container, ensuring safety, security, and reproducibility while capturing detailed runtime behavior.

The Semantic Core then takes over, converting the student’s code into a high-dimensional vector embedding that captures its deeper algorithmic intent. This core also houses the Feedback Engine, which orchestrates the generation of pedagogical feedback, leveraging the Prompt Pooling mechanism. Finally, the Reporting and Analytics module compiles comprehensive reports for individual students and generates class-wide summaries, including the interactive UMAP visualizations for instructors.

Also Read:

Empirical Validation and Future Directions

The development of Autograder+ involved rigorous experimentation using both external code corpora and internally collected student submissions. Baseline evaluations of various large language models showed that models like falcon3:10b and llama3.2:3b offered strong semantic alignment with human feedback and practical inference times. Further enhancements demonstrated that the Prompt Pooling mechanism consistently improved the quality of AI-generated feedback, boosting both lexical and semantic similarity scores. While direct fine-tuning showed mixed results, the overall framework proved highly effective in generating relevant and accurate feedback.

Looking ahead, the researchers plan to deploy Autograder+ in actual programming courses to assess its real-world impact on learner experience and instructional workflows. Future work also includes longitudinal learning analytics to track student evolution over time, large-scale evaluations across diverse institutions, and extending its adaptability to other programming domains beyond introductory courses. This research paper, Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education, details the full scope of this innovative system.

By combining advanced AI feedback generation, semantic organization, and visualization, Autograder+ aims to significantly reduce the evaluation workload for educators while empowering them to deliver targeted instruction and foster more resilient learning outcomes in programming education.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -