Comparing AI and Human Approaches to Thematic Analysis in Digital Mental Health Research

TLDR: A new study compares human-based thematic analysis with LLM-based methods (GPT-4o) for digital mental health research. While LLMs offer significant cost and time efficiencies, especially in code development and saturation, human analysis provides superior depth, nuance, and contextual understanding in child code development, excerpt identification, and theme synthesis. The research suggests a hybrid human-AI approach could optimize qualitative analysis in healthcare.

Thematic analysis is a crucial method in qualitative research, especially in digital mental health studies. It helps researchers understand people’s experiences by identifying patterns and themes in text, like interview transcripts. While it offers rich insights, traditional human-based thematic analysis is very time-consuming and resource-intensive, which can limit its use in larger healthcare studies.

Recently, large language models (LLMs) have emerged as a promising tool for analyzing text at scale and automatically identifying key content. This has led to the question of whether LLMs can effectively perform thematic analysis, particularly in sensitive areas like mental health interviews, and how their performance compares to human experts.

A recent proof-of-concept study, titled “Human vs. LLM-Based Thematic Analysis for Digital Mental Health Research: Proof-of-Concept Comparative Study (2025)”, explored this very question. The research, conducted by Karisa Parkington, Bazen G. Teferra, Marianne Rouleau-Tang, Argyrios Perivolaris, Alice Rueda, Adam Dubrowski, Bill Kapralos, Reza Samavi, Andrew Greenshaw, Yanbo Zhang, Bo Cao, Yuqi Wu, Sirisha Rambhatla, Sridhar Krishnan, and Venkat Bhat, directly compared LLM-based thematic analysis with traditional human methods. You can find the full paper here: Research Paper.

The study used OpenAI’s GPT-4o model, applying a structured prompt engineering framework called RISEN (Role, Instructions, Steps, End-Goal, Narrowing). Two LLM approaches were tested: an ‘out-of-the-box’ model and a ‘knowledge-based’ model that incorporated established qualitative research principles (Braun and Clarke’s framework). These were compared against human analysis performed using Dedoose software.

The researchers used semi-structured interview transcripts from a stress-reduction trial involving healthcare workers. Both human and LLM approaches independently developed codes (labels for text excerpts), identified when new codes stopped appearing (saturation points), applied codes to excerpts from a subset of 20 participants, and synthesized data into overarching themes. Their outputs and performance metrics were then directly compared.

One of the most striking findings was the difference in resources. Human-led thematic analysis took approximately 110 hours and cost around $3,537 CAD (including software). In contrast, the LLM-based analyses were completed by a single researcher in about 40 hours, with a personnel cost of approximately $1,260 CAD and a technical cost of only $12.10 CAD for OpenAI API usage. This clearly highlights the cost-effectiveness of LLM-based methods.

In terms of code development, GPT-4o LLMs, especially when guided by the RISEN framework, developed deductive parent codes that were comparable to those created by humans. However, human researchers provided much more detail in developing inductive child codes and in synthesizing themes. The knowledge-based LLM achieved coding saturation (meaning no new codes emerged) with fewer transcripts (10-15) compared to the out-of-the-box model (15-20) and human analysis (90-99), indicating greater efficiency in this specific aspect.

When it came to identifying and applying codes to specific text excerpts, the out-of-the-box LLM identified a similar number of excerpts as human researchers, showing strong agreement (Kappa = 0.84). However, the knowledge-based LLM produced significantly fewer excerpts. Human-selected excerpts tended to be longer and often had multiple codes applied to them, capturing more context. LLM-generated excerpts were typically shorter and usually assigned only one code, suggesting a less nuanced interpretation.

For theme synthesis, human researchers identified nine overarching themes, while both LLM approaches each produced six themes. There was substantial overlap between the themes generated by the two LLM models. While LLMs consistently identified key themes like the impact of systemic factors, emotional burden, and ethical reasoning, human analysis provided more specific details, such as emotional triggers linked to avatars or events in the VR simulation. Human coders also identified additional themes not captured by the LLMs and offered more precise distinctions between overlapping concepts.

Also Read:

Overall, the study concluded that while LLM-based thematic analysis is more cost-effective and efficient in certain aspects, it currently lacks the specificity, depth, and nuanced understanding that human analysis provides, especially in capturing individual differences and contextual factors crucial for mental health insights. The findings suggest that a hybrid model, combining the efficiency and scalability of LLMs for initial code development and preliminary theme construction with human oversight for detailed excerpt identification, coding application, and theme refinement, could be the most balanced and effective approach. This human-AI collaboration holds significant promise for accelerating qualitative research in digital mental healthcare, enabling faster analysis of patient experiences and informing intervention refinements, while maintaining methodological rigor.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Comparing AI and Human Approaches to Thematic Analysis in Digital Mental Health Research

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates