TLDR: A study comparing LLM-generated mental health responses with human therapist responses found that LLMs (ChatGPT, Gemini, Llama) produced longer, more readable, and positively toned answers, which users and therapists rated as clearer, more respectful, and more supportive. Despite these higher ratings, both groups strongly preferred human therapists for actual support, citing concerns about trust, privacy, and accountability. The research highlights the potential of LLMs for informational support but emphasizes they are not substitutes for professional mental health care, especially given legal and ethical challenges.
A recent study delves into the capabilities of large language models (LLMs) in addressing mental health questions, comparing their responses to those provided by human therapists. The research, titled Can LLMs Address Mental Health Questions? A Comparison with Human Therapists, highlights both the promising aspects and significant limitations of integrating AI into mental health support.
The global demand for mental health care continues to rise, with many individuals facing barriers to timely and affordable access. This has spurred interest in digital tools and conversational agents powered by LLMs. However, questions about their quality and public reception have remained largely unanswered until now.
Study Design and Methodology
Researchers from the University of Chicago, University of Virginia, and other institutions conducted a comprehensive study involving 150 users and 23 licensed therapists. They compared therapist-written responses from the Counsel Chat dataset with answers generated by three prominent LLMs: ChatGPT, Gemini, and Llama. The study focused on three key research questions:
- What are the differences in responses between LLMs and licensed therapists?
- How do users perceive LLM-generated responses compared to therapist responses?
- How do licensed therapists perceive LLM-generated responses compared to therapist responses?
Participants rated responses on dimensions such as clarity, empathy, respect, and overall quality. Therapists also assessed professional acceptability. Complementing these perceptual evaluations, a text analysis examined linguistic and stylistic differences, including readability, vocabulary diversity, sentiment, and the use of hedging or first-person language.
Key Findings: LLMs Rated Higher, Yet Humans Preferred
The study revealed a fascinating dichotomy. Text analysis showed that LLM-generated responses were generally longer, had richer vocabulary, and exhibited a more positive or neutral tone. They also used more cautious or qualified language (hedging). In contrast, therapist responses were shorter, more readable, and more frequently used first-person framing, reflecting a personal style.
Surprisingly, both general users and licensed therapists consistently rated LLM-generated answers higher than therapist-written ones across dimensions like clarity, encouragement, and respectfulness. ChatGPT and Gemini performed comparably, while Llama was often rated highest. This suggests that LLMs possess a strong communicative competence, capable of producing responses that are perceived as highly supportive and well-articulated.
However, this perceived quality did not translate into a preference for AI over human support. A significant 76% of participants expressed a strong preference for seeking help from a human therapist when facing mental health questions. While over 40% of users reported being likely to use LLMs for mental health questions, only about 25% of licensed therapists would recommend LLMs for general mental health information, and a mere 4% would recommend them for advice similar to psychotherapy sessions.
Also Read:
- Training AI for Therapy: How Preference Optimization Outperforms Imitation in Delivering ACT
- Assessing AI’s Clinical Acumen: Introducing PsychiatryBench for Language Models in Mental Health
Challenges and Future Directions
The research highlights a critical tension: LLMs demonstrate impressive communicative abilities, but concerns about trust, privacy, and accountability remain paramount. Participants struggled to reliably distinguish between human and AI-generated responses, yet their quality judgments were largely independent of authorship knowledge.
The paper also underscores significant legal and privacy challenges. LLMs often do not meet the confidentiality standards required in therapeutic practice, raising concerns about data security and potential misuse of sensitive information. The study cites a lawsuit against OpenAI and its CEO, alleging ChatGPT’s role in a teenager’s suicide, emphasizing the severe risks of inappropriate AI advice. Regulatory actions, such as the WOPR Act in Illinois, are emerging to prohibit AI systems from independently delivering therapy or making diagnoses without professional supervision.
The authors emphasize that LLMs should be viewed as supplemental tools rather than substitutes for professional care. Design implications include prioritizing efficacy, transparency, accountability, and privacy. LLMs could effectively serve as sources of psychological education, journaling support, or triage, with crucial mechanisms for escalating to human professionals in crisis situations. Future research should focus on longitudinal studies to understand how LLMs integrate into daily life, their impact on trust and coping strategies, and the development of privacy-preserving architectures.
In conclusion, while LLMs show great promise in extending access to supportive communication, their inherent limitations in accountability, contextual judgment, and ethical safeguards mean they cannot replace human therapists. Interdisciplinary collaboration between computer scientists and mental health professionals is essential to guide the responsible development and deployment of AI-assisted mental health support systems.


