TLDR: ChatCLIDS is a new benchmark that uses AI-driven nurse and patient agents to simulate persuasive dialogues for increasing closed-loop insulin delivery system (CLIDS) adoption in Type 1 Diabetes. It evaluates LLMs across single-visit, multi-visit, and social resistance scenarios, revealing that while larger LLMs can adapt strategies, all models struggle to overcome patient resistance, especially under social pressure, highlighting current limitations in AI for health behavior change.
Closed-loop insulin delivery systems (CLIDS) represent a significant advancement in managing Type 1 Diabetes (T1D), offering automated glucose monitoring and insulin dosing. Despite their clear medical benefits, the real-world adoption of these systems remains surprisingly low, with fewer than 25% of eligible patients initiating use and up to 30% discontinuing within six months. This challenge isn’t primarily due to technical failures but rather a complex interplay of behavioral, psychosocial, and social barriers.
To address this critical gap, researchers have introduced ChatCLIDS, a novel benchmark designed to rigorously evaluate how large language models (LLMs) can drive persuasive dialogues for health behavior change. This innovative framework simulates multi-turn interactions between virtual nurse agents and virtual patients, each meticulously crafted with clinically grounded, diverse profiles and realistic barriers to CLIDS adoption. The nurse agents are equipped with a wide array of evidence-based persuasive strategies, such as empathy, logical reasoning, expert endorsement, and motivational coaching.
ChatCLIDS stands out by supporting not only single-visit and multi-visit counseling scenarios but also adversarial social influence scenarios, where virtual patients encounter peer pressure or misinformation. This allows for a robust, multi-dimensional evaluation of persuasive AI. The framework’s design captures the heterogeneity of attitudes, misconceptions, and resistance observed in real T1D patients, enabling high-fidelity and customizable assessments of AI-driven interventions.
Understanding the ChatCLIDS Framework
At its core, ChatCLIDS features two interacting LLM agents: a Patient Agent and a Nurse Agent. The Patient Agents are initialized with rigorously curated and expert-validated clinical and psychosocial profiles, reflecting real-world diversity. These profiles are generated through a multi-stage process involving de-identified real-world data, expert curation of features like demographics, socioeconomic factors, clinical history, personality, and specific barriers to adoption. Patients are categorized into Easy, Medium, and Hard difficulty levels based on the complexity and number of their barriers.
The Nurse Agents operate under two prompting paradigms: Direct Prompting, where they craft persuasive responses using 31 validated strategies, and Chain-of-Strategy (CoS), where they first identify and justify strategies before responding. This transparency allows for a deeper understanding of the LLM’s reasoning process.
Simulating Real-World Challenges
Single-Visit: This scenario simulates a typical clinical encounter, testing the model’s ability for short-term, adaptive persuasive reasoning and conversational flow.
Multi-Visit: This models long-term engagement, with 10 consecutive simulated “visits.” Nurse agents produce self-critique summaries and plan adjustments, while both patient and nurse agents retain cumulative memory, reflecting real-world continuity and adaptation.
Social Resistance: In this challenging scenario, after each nurse-patient session, the Patient interacts with a Social Resistance Agent that introduces misinformation, skepticism, or negative social cues, mirroring real-world peer pressure or internet misinformation. The Nurse Agent is blind to these interventions, and both influences shape the Patient Agent’s stance.
Key Findings and Implications
The research yielded several important observations. While larger and more reflective LLMs showed an ability to adapt strategies over time, all models struggled significantly to overcome patient resistance, especially when faced with realistic social pressure. The Chain of Strategy (CoS) protocol generally boosted effectiveness, particularly for easy and medium cases, but its impact on hard cases was limited.
In multi-visit settings, models with explicit reflection mechanisms, such as o4-mini and Deepseek-R1, demonstrated substantial gains over “no thinking” models, learning to select strategies better suited to individual patient barriers. However, the presence of a Social Resistance Agent led to a dramatic degradation in performance across all agents, highlighting a critical limitation of current LLMs in navigating complex social environments and misinformation.
Qualitative analysis revealed that rapport-building, cognitive reframing, and incremental requests (e.g., “Foot-in-the-door”) were often associated with positive changes in persuasion ratings. In social resistance scenarios, strategies leveraging pre-existing relationships or external authority proved more robust than purely informational approaches.
Also Read:
- Unveiling Hidden Biases: A New Framework for Fair AI in Clinical Decisions
- Unlocking Deeper Understanding: How Multi-Agent LLMs Are Revolutionizing Causal AI
Looking Ahead
ChatCLIDS provides a scalable and clinically grounded testbed for advancing trustworthy persuasive AI in healthcare. The findings underscore the need for future research to develop more robust, context-aware, and socially adaptive LLM-based agents. While the simulation offers valuable insights, the authors acknowledge limitations, including the reliance on synthetic patient profiles and the current focus on conversational outcomes rather than real patient behaviors. Ethical considerations, such as privacy and the potential for misinformation, are also carefully addressed, emphasizing that these simulated dialogues are not yet approved for real-world clinical use without extensive validation and expert oversight.
For more detailed information, you can read the full research paper here: ChatCLIDS Research Paper.


