AI Framework Offers Detailed Feedback for Undergraduate Theses

TLDR: PEMUTA is a new AI framework that uses large language models (LLMs) to assess undergraduate theses with detailed, multi-granular feedback. Unlike traditional holistic scoring, PEMUTA evaluates theses across six dimensions (Structure, Logic, Originality, Writing, Proficiency, Rigor) based on Vygotsky’s theory and Bloom’s Taxonomy. It employs hierarchical prompting, few-shot learning, and role-play prompting to align with expert judgments, providing more interpretable and pedagogically relevant evaluations.

Undergraduate theses are a cornerstone of academic assessment, serving as a comprehensive measure of a student’s cumulative academic development. However, the traditional methods of evaluating these lengthy and complex documents, whether manual or automated, often fall short. Manual assessment is time-consuming and labor-intensive, while existing automated systems, typically powered by large language models (LLMs), tend to offer only a single, holistic score. This broad evaluation often overlooks the intricate details across various criteria, limiting the depth of feedback students receive and failing to align with established pedagogical objectives.

Addressing this critical gap, researchers have pioneered a novel framework called PEMUTA: Pedagogically-Enriched Multi-Granular Undergraduate Thesis Assessment. This innovative approach aims to activate the domain-specific knowledge within LLMs to provide a more nuanced and detailed evaluation of undergraduate theses.

A Foundation in Educational Theory

PEMUTA is built upon two foundational pedagogical theories widely used in manual thesis evaluation: Vygotsky’s sociocultural theory and Bloom’s Taxonomy. Vygotsky’s theory emphasizes the developmental and learning potential aspects of a thesis, focusing on how students progress towards independent academic competence. Bloom’s Taxonomy, on the other hand, provides a structured hierarchy of cognitive skills, from remembering to creating, which is crucial for instructional design and assessment.

By integrating insights from both theories, PEMUTA defines six pedagogically grounded dimensions for assessment, collectively abbreviated as SLOWPR:

Structure: Evaluates the organization, coherence, and logical flow of the thesis chapters.
Logic: Assesses the clarity and consistency of arguments, ensuring alignment between research questions, methodology, evidence, and conclusions.
Originality: Examines the novelty and insightfulness of the thesis, including new perspectives or solutions.
Writing: Focuses on linguistic clarity, grammatical accuracy, academic tone, and adherence to disciplinary writing conventions.
Proficiency: Measures the student’s mastery of disciplinary knowledge, including their understanding, application, and analysis of concepts and methods.
Rigor: Evaluates adherence to academic conventions, citation accuracy, source reliability, and ethical compliance.

How PEMUTA Works

The framework employs a hierarchical prompting strategy that guides the LLM through a two-stage evaluation process. In the first stage, the model performs dimension-specific assessments for each of the six SLOWPR criteria, generating individual scores and justifications. This decomposition helps reduce interference between different criteria and allows for more targeted activation of relevant knowledge within the LLM. Once these fine-grained assessments are complete, the model proceeds to the second stage, synthesizing them into a coherent holistic evaluation, which includes an overall score and practical suggestions for improvement.

To further enhance alignment with expert judgments without requiring extensive fine-tuning, PEMUTA incorporates two in-context learning techniques:

Few-shot prompting: The model is provided with a few formatted examples of multi-granular thesis evaluations, helping it internalize the desired structure and format of rubric-aligned assessments.
Role-play prompting: The LLM is instructed to assume the persona of an experienced university professor or thesis committee member. This role conditioning encourages the model to adopt a formal academic tone, use discipline-appropriate vocabulary, and apply expert evaluative reasoning.

The MUTA Dataset and Experimental Validation

To support this multi-granular assessment task, a new dataset called MUTA (Multi-granular Undergraduate Thesis Assessment) was curated. It comprises 60 authentic undergraduate theses from Computer Science students, each manually annotated with 0-10 scale ratings across the SLOWPR dimensions and a holistic score. The theses, originally in PDF format, undergo a meticulous pre-processing pipeline to convert them into a clean, semantically consistent, and logically structured plain-text representation suitable for LLM processing.

Extensive experiments demonstrate that PEMUTA consistently outperforms standard holistic prompting strategies across various state-of-the-art LLMs. It achieves significantly lower Mean Absolute Error (MAE) and Mean Squared Error (MSE), and higher Pearson Correlation Coefficient (PCC) with expert ratings, indicating a stronger agreement with human evaluations. Ablation studies further confirm that each component—hierarchical prompting, few-shot exemplars, and role-play instructions—contributes meaningfully and synergistically to the framework’s enhanced performance.

Also Read:

Looking Ahead

PEMUTA represents a significant advancement in automated undergraduate thesis assessment, offering a scalable, interpretable, and pedagogically grounded solution. By providing detailed, criterion-specific feedback alongside a holistic judgment, it empowers students with actionable insights for improvement and alleviates the workload of educators. Future work aims to extend PEMUTA into multimodal assessment frameworks, incorporating code artifacts, presentation recordings, and other learning outputs for an even more comprehensive evaluation of students’ competencies. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI Framework Offers Detailed Feedback for Undergraduate Theses

A Foundation in Educational Theory

How PEMUTA Works

The MUTA Dataset and Experimental Validation

Looking Ahead

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates