TLDR: This research paper introduces an Embedded AI Literacy Framework designed to address privacy and safety concerns in mental health AI chatbots. It proposes integrating AI literacy interventions directly into conversational systems through a local ‘wrapper layer’ with three modules: a Prompt Coach to improve user input clarity, a Disclosure Monitor to identify and manage sensitive information, and a Transparency Engine to explain data handling. The goal is to empower users to engage safely and effectively with AI, preventing over-disclosure and building trust, with a planned study to evaluate its impact.
Large Language Models (LLMs) are becoming increasingly common in mental health support, from structured therapeutic tools to informal well-being assistants. While these AI systems offer benefits like increased accessibility and personalized care, their integration into mental health services introduces significant privacy and safety concerns that have not been thoroughly addressed.
Unlike traditional therapy, LLM-based interactions often lack clear guidelines on what information is collected, how it’s processed, and how it’s stored or reused. Users, without professional clinical guidance, might inadvertently share too much personal information. This can happen due to a misplaced sense of trust, a lack of awareness about data risks, or the conversational nature of these AI systems. This oversharing not only raises privacy alarms but also increases the potential for AI bias, misinterpretation of sensitive details, and long-term misuse of data.
Introducing the Embedded AI Literacy Framework
To tackle these critical issues, researchers Soraya S. Anvari and Rina R. Wehbe propose an innovative solution: an Embedded AI Literacy Framework. This framework aims to integrate AI literacy interventions directly into mental health conversational systems. The core idea is to move beyond simply identifying risks and instead empower users with the knowledge and tools to engage safely and effectively with AI support.
The framework acts as an adaptive ‘wrapper layer’ around existing LLM-based systems. This design ensures compatibility with various AI models and APIs while maintaining transparency about the educational interventions. Crucially, this layer operates locally on the user’s device or within a secure client environment, monitoring interactions in real-time without transmitting sensitive data to external servers. This local processing minimizes privacy risks while ensuring responsiveness.
Key Components of the Framework
The framework consists of three main modules, each designed to foster a specific AI literacy principle:
- Prompt Coach: This module helps users craft more effective prompts. It detects vague or ambiguous inputs and offers structured, example-based reformulations. For instance, if a user types a general query, the system might suggest, “Would you like to focus on stress, relationships, or study pressure?” It adapts its guidance, offering subtler hints to experienced users and more structured examples to novices.
- Disclosure Monitor: This component classifies user input based on its sensitivity: safe (general feelings), personal (identifiable but non-critical details), or high-risk (potentially harmful or crisis-related content). If a personal or high-risk disclosure is detected, the system might prompt, “This message may include personal details. Would you like to rephrase or continue?” For high-risk cases, it automatically provides referral links to national help lines or campus resources. All analysis for this module is performed locally on the user’s device to protect sensitive information.
- Transparency Engine: This module builds trust by providing clear, plain-language explanations about how the system handles user data. These explanations appear at relevant moments during the conversation, such as when users ask about privacy or when sensitive topics arise. This approach ensures users are informed without being overwhelmed by technical jargon, helping them feel more in control of their data.
Also Read:
- Beyond Instructions: How AI Agents Are Vulnerable to Misleading Information and How Fact-Checking Can Help
- Rethinking Security: How AI Agents Can Intelligently Govern Information Flow
Evaluating the Framework’s Impact
The researchers plan a longitudinal study involving non-clinical users interacting with mental health chatbots for reflection and educational purposes. The study will compare a baseline chatbot without literacy features against a version incorporating the embedded AI literacy layer. Participants will engage with both systems over several weeks to observe changes in their prompting behavior, disclosure patterns, and the development of trust over time.
Evaluation will focus on three key areas: prompt literacy (measured by prompt clarity and user self-reported learning), safe disclosure (analyzing the frequency of personal or high-risk details shared), and trust and transparency (assessed through established scales and comprehension questions about data handling). This comprehensive evaluation aims to demonstrate how literacy-embedded chatbots can lead to clearer prompts, reduced unsafe disclosure, and enhanced understanding of data practices, ultimately fostering greater perceived trust and safety in AI-supported mental health education.
This research, accepted to SMASH 2025, highlights a crucial step towards developing responsible, transparent, and user-centered AI for mental health support. You can read the full paper for more technical details here: Therapeutic AI and the Hidden Risks of Over-Disclosure.


