spot_img
HomeResearch & DevelopmentAssessing Creativity: A Look at AI vs. Human Grading...

Assessing Creativity: A Look at AI vs. Human Grading in Design Thinking Education

TLDR: An exploratory study compared AI and Teaching Assistant (TA) assessments of student design thinking posters. While AI offered efficiency and consistency, it showed low agreement with instructor scores for empathy and pain points, and teachers generally preferred TA scores. The research suggests a need for hybrid assessment models that combine AI’s efficiency with human insight, emphasizing AI’s potential for formative feedback rather than summative grading.

In the evolving landscape of education, particularly in creative fields like design thinking, evaluating student work presents a significant challenge. As design thinking programs expand in secondary and tertiary education, educators face the daunting task of assessing creative artifacts that combine both visual and textual elements. Traditional assessment methods, often relying on rubrics and Teaching Assistants (TAs) in large classes, are frequently described as laborious, time-consuming, and inconsistent.

A recent exploratory study delves into this challenge by comparing the reliability and perceived accuracy of AI-assisted assessment against TA-assisted assessment for student posters in design thinking education. The research, conducted with 33 school teachers from Singapore’s Ministry of Education, aimed to understand how AI-generated scores align with human grading and to gauge teacher preferences for different scoring methods.

The Study’s Approach

The study involved two main activities. In the first task, teachers manually evaluated ten anonymized student journey map samples using a structured rubric. These journey maps are visual representations of a user’s experience, highlighting actions, emotions, pain points, and opportunities for improvement, and are assessed on three key dimensions: empathy and user understanding, identification of pain points and opportunities, and visual communication.

For the second task, teachers reviewed a different set of ten samples, each presented with three anonymized scores: one from an AI grader, one from a Teaching Assistant, and a hybrid score (an average of AI and TA scores). Teachers were asked to indicate which score they agreed with most, without knowing the source of each score.

The AI grader used in the study was developed on OpenAI’s GPT-4o, calibrated through prompt engineering and exemplar referencing to evaluate student journey maps based on the structured rubric. Teaching Assistants at the Singapore University of Technology and Design (SUTD) also scored the samples, with instructor scores serving as the benchmark.

Key Findings on AI vs. Human Assessment

The quantitative results revealed interesting discrepancies. For the dimensions of “Empathy and User Understanding” and “Identification of Pain Points and Opportunities,” there was low statistical agreement between instructor scores and AI scores. AI scores tended to be consistently around mid-to-high values, showing minimal variance and often over-scoring student work, suggesting a potential lack of nuanced recognition of deficiencies or contextual interpretation. However, for “Visual Communication,” a slightly higher alignment was observed between instructor and AI scores.

When it came to teacher preferences, the study found that teachers generally preferred TA-assigned scores in six out of ten samples. AI-assigned scores were preferred in four samples, while hybrid scores (simple averages) were not a top choice for any sample. This indicates a continued trust in human judgment, particularly in subjective assessments.

Teacher Perspectives: Advantages and Concerns

Qualitative feedback from teachers highlighted several advantages of AI-based scoring, including its efficiency, speed, and potential for providing immediate, consistent formative feedback. Teachers saw value in AI supporting student self-directed learning by offering rationales for scores without constant teacher intervention. Some even noted that AI scoring could be more consistent for large cohorts and potentially less biased than manual scoring.

However, significant concerns were also raised. Teachers expressed doubts about AI’s accuracy and its ability to capture nuanced aspects of student work, especially empathy and contextual insight. They noted that AI scores sometimes diverged significantly from their own, particularly in cases requiring subtle visual interpretation. Manual scoring, despite being slow and labor-intensive, was appreciated for fostering deeper engagement, discussion, and a more authentic understanding of student work.

Also Read:

Moving Towards Hybrid Models

The study underscores the need for hybrid assessment models that integrate the computational efficiency of AI with the invaluable insights of human judgment. The researchers recommend that AI be primarily positioned as a tool for formative assessment, allowing students to iteratively improve their work with immediate feedback, rather than for final, high-stakes grading. Further research is needed to develop more sophisticated hybrid approaches that go beyond simple averaging and to align AI training more closely with individual instructor scoring patterns.

This research contributes to the ongoing conversation about responsible AI adoption in creative disciplines, emphasizing the critical balance between automation and human insight for scalable and pedagogically sound assessment practices. For more details, you can refer to the full paper here.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -