Assessing Creativity: A Look at AI vs. Human Grading in Design Thinking Education

TLDR: An exploratory study compared AI and Teaching Assistant (TA) assessments of student design thinking posters. While AI offered efficiency and consistency, it showed low agreement with instructor scores for empathy and pain points, and teachers generally preferred TA scores. The research suggests a need for hybrid assessment models that combine AI’s efficiency with human insight, emphasizing AI’s potential for formative feedback rather than summative grading.

In the evolving landscape of education, particularly in creative fields like design thinking, evaluating student work presents a significant challenge. As design thinking programs expand in secondary and tertiary education, educators face the daunting task of assessing creative artifacts that combine both visual and textual elements. Traditional assessment methods, often relying on rubrics and Teaching Assistants (TAs) in large classes, are frequently described as laborious, time-consuming, and inconsistent.

A recent exploratory study delves into this challenge by comparing the reliability and perceived accuracy of AI-assisted assessment against TA-assisted assessment for student posters in design thinking education. The research, conducted with 33 school teachers from Singapore’s Ministry of Education, aimed to understand how AI-generated scores align with human grading and to gauge teacher preferences for different scoring methods.

The Study’s Approach

The study involved two main activities. In the first task, teachers manually evaluated ten anonymized student journey map samples using a structured rubric. These journey maps are visual representations of a user’s experience, highlighting actions, emotions, pain points, and opportunities for improvement, and are assessed on three key dimensions: empathy and user understanding, identification of pain points and opportunities, and visual communication.

For the second task, teachers reviewed a different set of ten samples, each presented with three anonymized scores: one from an AI grader, one from a Teaching Assistant, and a hybrid score (an average of AI and TA scores). Teachers were asked to indicate which score they agreed with most, without knowing the source of each score.

The AI grader used in the study was developed on OpenAI’s GPT-4o, calibrated through prompt engineering and exemplar referencing to evaluate student journey maps based on the structured rubric. Teaching Assistants at the Singapore University of Technology and Design (SUTD) also scored the samples, with instructor scores serving as the benchmark.

Key Findings on AI vs. Human Assessment

The quantitative results revealed interesting discrepancies. For the dimensions of “Empathy and User Understanding” and “Identification of Pain Points and Opportunities,” there was low statistical agreement between instructor scores and AI scores. AI scores tended to be consistently around mid-to-high values, showing minimal variance and often over-scoring student work, suggesting a potential lack of nuanced recognition of deficiencies or contextual interpretation. However, for “Visual Communication,” a slightly higher alignment was observed between instructor and AI scores.

When it came to teacher preferences, the study found that teachers generally preferred TA-assigned scores in six out of ten samples. AI-assigned scores were preferred in four samples, while hybrid scores (simple averages) were not a top choice for any sample. This indicates a continued trust in human judgment, particularly in subjective assessments.

Teacher Perspectives: Advantages and Concerns

Qualitative feedback from teachers highlighted several advantages of AI-based scoring, including its efficiency, speed, and potential for providing immediate, consistent formative feedback. Teachers saw value in AI supporting student self-directed learning by offering rationales for scores without constant teacher intervention. Some even noted that AI scoring could be more consistent for large cohorts and potentially less biased than manual scoring.

However, significant concerns were also raised. Teachers expressed doubts about AI’s accuracy and its ability to capture nuanced aspects of student work, especially empathy and contextual insight. They noted that AI scores sometimes diverged significantly from their own, particularly in cases requiring subtle visual interpretation. Manual scoring, despite being slow and labor-intensive, was appreciated for fostering deeper engagement, discussion, and a more authentic understanding of student work.

Also Read:

Moving Towards Hybrid Models

The study underscores the need for hybrid assessment models that integrate the computational efficiency of AI with the invaluable insights of human judgment. The researchers recommend that AI be primarily positioned as a tool for formative assessment, allowing students to iteratively improve their work with immediate feedback, rather than for final, high-stakes grading. Further research is needed to develop more sophisticated hybrid approaches that go beyond simple averaging and to align AI training more closely with individual instructor scoring patterns.

This research contributes to the ongoing conversation about responsible AI adoption in creative disciplines, emphasizing the critical balance between automation and human insight for scalable and pedagogically sound assessment practices. For more details, you can refer to the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Assessing Creativity: A Look at AI vs. Human Grading in Design Thinking Education

The Study’s Approach

Key Findings on AI vs. Human Assessment

Teacher Perspectives: Advantages and Concerns

Moving Towards Hybrid Models

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates