spot_img
HomeResearch & DevelopmentComparing AI and Human Approaches to Thematic Analysis in...

Comparing AI and Human Approaches to Thematic Analysis in Digital Mental Health Research

TLDR: A new study compares human-based thematic analysis with LLM-based methods (GPT-4o) for digital mental health research. While LLMs offer significant cost and time efficiencies, especially in code development and saturation, human analysis provides superior depth, nuance, and contextual understanding in child code development, excerpt identification, and theme synthesis. The research suggests a hybrid human-AI approach could optimize qualitative analysis in healthcare.

Thematic analysis is a crucial method in qualitative research, especially in digital mental health studies. It helps researchers understand people’s experiences by identifying patterns and themes in text, like interview transcripts. While it offers rich insights, traditional human-based thematic analysis is very time-consuming and resource-intensive, which can limit its use in larger healthcare studies.

Recently, large language models (LLMs) have emerged as a promising tool for analyzing text at scale and automatically identifying key content. This has led to the question of whether LLMs can effectively perform thematic analysis, particularly in sensitive areas like mental health interviews, and how their performance compares to human experts.

A recent proof-of-concept study, titled “Human vs. LLM-Based Thematic Analysis for Digital Mental Health Research: Proof-of-Concept Comparative Study (2025)”, explored this very question. The research, conducted by Karisa Parkington, Bazen G. Teferra, Marianne Rouleau-Tang, Argyrios Perivolaris, Alice Rueda, Adam Dubrowski, Bill Kapralos, Reza Samavi, Andrew Greenshaw, Yanbo Zhang, Bo Cao, Yuqi Wu, Sirisha Rambhatla, Sridhar Krishnan, and Venkat Bhat, directly compared LLM-based thematic analysis with traditional human methods. You can find the full paper here: Research Paper.

The study used OpenAI’s GPT-4o model, applying a structured prompt engineering framework called RISEN (Role, Instructions, Steps, End-Goal, Narrowing). Two LLM approaches were tested: an ‘out-of-the-box’ model and a ‘knowledge-based’ model that incorporated established qualitative research principles (Braun and Clarke’s framework). These were compared against human analysis performed using Dedoose software.

The researchers used semi-structured interview transcripts from a stress-reduction trial involving healthcare workers. Both human and LLM approaches independently developed codes (labels for text excerpts), identified when new codes stopped appearing (saturation points), applied codes to excerpts from a subset of 20 participants, and synthesized data into overarching themes. Their outputs and performance metrics were then directly compared.

One of the most striking findings was the difference in resources. Human-led thematic analysis took approximately 110 hours and cost around $3,537 CAD (including software). In contrast, the LLM-based analyses were completed by a single researcher in about 40 hours, with a personnel cost of approximately $1,260 CAD and a technical cost of only $12.10 CAD for OpenAI API usage. This clearly highlights the cost-effectiveness of LLM-based methods.

In terms of code development, GPT-4o LLMs, especially when guided by the RISEN framework, developed deductive parent codes that were comparable to those created by humans. However, human researchers provided much more detail in developing inductive child codes and in synthesizing themes. The knowledge-based LLM achieved coding saturation (meaning no new codes emerged) with fewer transcripts (10-15) compared to the out-of-the-box model (15-20) and human analysis (90-99), indicating greater efficiency in this specific aspect.

When it came to identifying and applying codes to specific text excerpts, the out-of-the-box LLM identified a similar number of excerpts as human researchers, showing strong agreement (Kappa = 0.84). However, the knowledge-based LLM produced significantly fewer excerpts. Human-selected excerpts tended to be longer and often had multiple codes applied to them, capturing more context. LLM-generated excerpts were typically shorter and usually assigned only one code, suggesting a less nuanced interpretation.

For theme synthesis, human researchers identified nine overarching themes, while both LLM approaches each produced six themes. There was substantial overlap between the themes generated by the two LLM models. While LLMs consistently identified key themes like the impact of systemic factors, emotional burden, and ethical reasoning, human analysis provided more specific details, such as emotional triggers linked to avatars or events in the VR simulation. Human coders also identified additional themes not captured by the LLMs and offered more precise distinctions between overlapping concepts.

Also Read:

Overall, the study concluded that while LLM-based thematic analysis is more cost-effective and efficient in certain aspects, it currently lacks the specificity, depth, and nuanced understanding that human analysis provides, especially in capturing individual differences and contextual factors crucial for mental health insights. The findings suggest that a hybrid model, combining the efficiency and scalability of LLMs for initial code development and preliminary theme construction with human oversight for detailed excerpt identification, coding application, and theme refinement, could be the most balanced and effective approach. This human-AI collaboration holds significant promise for accelerating qualitative research in digital mental healthcare, enabling faster analysis of patient experiences and informing intervention refinements, while maintaining methodological rigor.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -