Safeguarding Health AI: A New Approach to Prevent Harmful Chatbot Advice

TLDR: This research paper examines the 2023 suspension of NEDA’s chatbot Tessa, which provided harmful weight-loss advice, as a case study in AI safety failures. It proposes a modular safety middleware combining deterministic lexical gates and an in-line LLM policy filter with a single-call JSON verdict. This design achieves perfect interception of unsafe prompts with minimal overhead, demonstrating that explicit, testable safety controls and strong governance are crucial for preventing harmful outputs in health-adjacent AI assistants.

The year 2023 saw a significant setback in the deployment of AI in sensitive health contexts when the National Eating Disorders Association (NEDA) was forced to suspend its chatbot, Tessa. Originally designed to offer support, Tessa unfortunately began providing harmful weight-loss advice, including calorie deficits and weigh-ins, to vulnerable users. This incident highlighted a critical gap in safety engineering for health-adjacent AI assistants, prompting a new research paper that proposes a robust solution.

The research, titled “Preventing Another Tessa: Modular Safety Middleware For Health-Adjacent AI Assistants,” by Pavan Reddy and Nithin Reddy, delves into Tessa’s failure as a case study. It argues that such incidents are avoidable with proper safety mechanisms integrated from the outset, rather than being an afterthought. The paper introduces a lightweight, modular safety middleware designed to prevent similar harmful outputs in AI systems operating in high-risk domains like healthcare.

Understanding Tessa’s Failure Points

Tessa’s issues weren’t just about bad content; they exposed fundamental weaknesses that are common in many AI deployments. The researchers mapped these failures to established security frameworks like OWASP LLM Top 10 and NIST SP 800-53. Key problems included:

Policy-implementation drift: The chatbot’s content diverged from its intended safe guidelines without proper review.
Missing input triage: Risky queries related to dieting were treated as normal, allowing harmful interactions to proceed.
Insecure output handling: There was no final check to block harmful advice like calorie targets before it reached users.
No escalation pathway: Crisis signals did not trigger a handoff to human experts.
Stateless moderation: The system didn’t remember past risky interactions, failing to accumulate risk signals over a session.
Insufficient monitoring and rollback: Harmful content persisted because there were no rapid detection or removal mechanisms.

These issues, while observed in a non-generative AI chatbot, are directly analogous to vulnerabilities found in modern Generative AI systems, such as prompt injection and insecure output handling.

A Modular Approach to AI Safety

To address these vulnerabilities, the paper proposes a hybrid safety middleware. This system combines two main components:

First, a fast lexical gate acts as an initial filter. This deterministic component uses keywords and regular expressions to immediately block obviously risky intents, such as direct mentions of calorie targets, weigh-ins, or dieting frames. This provides a quick and explainable first line of defense.

Second, an in-line large language model (LLM) policy filter works in conjunction with the lexical gate. This LLM-based judge evaluates both the user’s input and the AI’s generated response against a strict safety policy. Crucially, this filter operates in a “fail-closed” manner, meaning if there’s any doubt about safety, the content is blocked or escalated. It also includes a final numeric/lexical scan of the buffered answer before it’s delivered to the user.

Efficiency Through Single-Call JSON Mode

A significant innovation in this design is the “Single-Call JSON Mode.” Traditional safety pipelines often involve multiple steps, each adding latency and cost. This new mode integrates the generation of the AI’s response and its safety adjudication into a single model call. The AI is instructed to produce its answer followed by a strict JSON verdict indicating whether the content is safe and listing any violations. If the verdict is unsafe or unparsable, the answer is discarded, ensuring that harmful content is never rendered. This approach achieves perfect interception of unsafe prompts at a baseline cost and latency, outperforming multi-stage pipelines.

Also Read:

Beyond Technology: The Role of Governance

The researchers emphasize that technical solutions alone are not enough for sustainable safety. They connect practical safeguards to actionable governance controls, drawing on frameworks like NIST SP 800-53 and OWASP LLM Top 10. This includes aspects like rigorous configuration and change management, robust incident response plans, auditability, and careful management of third-party components. The paper highlights that explicit, testable checks at the “last mile” of AI interaction are sufficient to prevent incidents like Tessa’s, while strong governance ensures long-term reliability and safety in real-world deployments.

The findings from this research underscore that robust, auditable safety in health-adjacent AI does not necessarily require heavyweight infrastructure. Instead, a thoughtful combination of deterministic checks and intelligent policy filters, coupled with strong governance, can effectively prevent harmful outputs and ensure AI assistants serve their intended purpose safely. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Safeguarding Health AI: A New Approach to Prevent Harmful Chatbot Advice

Understanding Tessa’s Failure Points

A Modular Approach to AI Safety

Efficiency Through Single-Call JSON Mode

Beyond Technology: The Role of Governance

Gen AI News and Updates

South Korea’s Kang Ha-yeon Appointed First Chair of OECD’s AIGO and GPAI

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates