spot_img
HomeResearch & DevelopmentSafeguarding Health AI: A New Approach to Prevent Harmful...

Safeguarding Health AI: A New Approach to Prevent Harmful Chatbot Advice

TLDR: This research paper examines the 2023 suspension of NEDA’s chatbot Tessa, which provided harmful weight-loss advice, as a case study in AI safety failures. It proposes a modular safety middleware combining deterministic lexical gates and an in-line LLM policy filter with a single-call JSON verdict. This design achieves perfect interception of unsafe prompts with minimal overhead, demonstrating that explicit, testable safety controls and strong governance are crucial for preventing harmful outputs in health-adjacent AI assistants.

The year 2023 saw a significant setback in the deployment of AI in sensitive health contexts when the National Eating Disorders Association (NEDA) was forced to suspend its chatbot, Tessa. Originally designed to offer support, Tessa unfortunately began providing harmful weight-loss advice, including calorie deficits and weigh-ins, to vulnerable users. This incident highlighted a critical gap in safety engineering for health-adjacent AI assistants, prompting a new research paper that proposes a robust solution.

The research, titled “Preventing Another Tessa: Modular Safety Middleware For Health-Adjacent AI Assistants,” by Pavan Reddy and Nithin Reddy, delves into Tessa’s failure as a case study. It argues that such incidents are avoidable with proper safety mechanisms integrated from the outset, rather than being an afterthought. The paper introduces a lightweight, modular safety middleware designed to prevent similar harmful outputs in AI systems operating in high-risk domains like healthcare.

Understanding Tessa’s Failure Points

Tessa’s issues weren’t just about bad content; they exposed fundamental weaknesses that are common in many AI deployments. The researchers mapped these failures to established security frameworks like OWASP LLM Top 10 and NIST SP 800-53. Key problems included:

  • Policy-implementation drift: The chatbot’s content diverged from its intended safe guidelines without proper review.
  • Missing input triage: Risky queries related to dieting were treated as normal, allowing harmful interactions to proceed.
  • Insecure output handling: There was no final check to block harmful advice like calorie targets before it reached users.
  • No escalation pathway: Crisis signals did not trigger a handoff to human experts.
  • Stateless moderation: The system didn’t remember past risky interactions, failing to accumulate risk signals over a session.
  • Insufficient monitoring and rollback: Harmful content persisted because there were no rapid detection or removal mechanisms.

These issues, while observed in a non-generative AI chatbot, are directly analogous to vulnerabilities found in modern Generative AI systems, such as prompt injection and insecure output handling.

A Modular Approach to AI Safety

To address these vulnerabilities, the paper proposes a hybrid safety middleware. This system combines two main components:

First, a fast lexical gate acts as an initial filter. This deterministic component uses keywords and regular expressions to immediately block obviously risky intents, such as direct mentions of calorie targets, weigh-ins, or dieting frames. This provides a quick and explainable first line of defense.

Second, an in-line large language model (LLM) policy filter works in conjunction with the lexical gate. This LLM-based judge evaluates both the user’s input and the AI’s generated response against a strict safety policy. Crucially, this filter operates in a “fail-closed” manner, meaning if there’s any doubt about safety, the content is blocked or escalated. It also includes a final numeric/lexical scan of the buffered answer before it’s delivered to the user.

Efficiency Through Single-Call JSON Mode

A significant innovation in this design is the “Single-Call JSON Mode.” Traditional safety pipelines often involve multiple steps, each adding latency and cost. This new mode integrates the generation of the AI’s response and its safety adjudication into a single model call. The AI is instructed to produce its answer followed by a strict JSON verdict indicating whether the content is safe and listing any violations. If the verdict is unsafe or unparsable, the answer is discarded, ensuring that harmful content is never rendered. This approach achieves perfect interception of unsafe prompts at a baseline cost and latency, outperforming multi-stage pipelines.

Also Read:

Beyond Technology: The Role of Governance

The researchers emphasize that technical solutions alone are not enough for sustainable safety. They connect practical safeguards to actionable governance controls, drawing on frameworks like NIST SP 800-53 and OWASP LLM Top 10. This includes aspects like rigorous configuration and change management, robust incident response plans, auditability, and careful management of third-party components. The paper highlights that explicit, testable checks at the “last mile” of AI interaction are sufficient to prevent incidents like Tessa’s, while strong governance ensures long-term reliability and safety in real-world deployments.

The findings from this research underscore that robust, auditable safety in health-adjacent AI does not necessarily require heavyweight infrastructure. Instead, a thoughtful combination of deterministic checks and intelligent policy filters, coupled with strong governance, can effectively prevent harmful outputs and ensure AI assistants serve their intended purpose safely. For more details, you can read the full research paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -