Unmasking Privacy Risks: Membership Inference in Clinical AI Models

TLDR: A new study explores membership inference vulnerabilities in clinical large language models (LLMs), specifically focusing on Llemr, a clinical question-answering model. Researchers found limited but measurable privacy leakage, even with a novel paraphrase-based attack that simulates realistic adversarial conditions. The findings highlight that while clinical LLMs show some resilience, they remain susceptible to subtle privacy risks, emphasizing the need for advanced privacy evaluations and defense mechanisms in healthcare AI.

As large language models (LLMs) become increasingly integrated into critical healthcare systems, from clinical decision support to patient information management, ensuring their privacy and trustworthiness is paramount. These powerful AI tools are often fine-tuned on sensitive electronic health record (EHR) data to enhance their performance in medical contexts. While this improves their domain-specific capabilities, it also introduces a significant risk: the potential exposure of patient information through the model’s behavior.

A recent work-in-progress study, titled “Exploring Membership Inference Vulnerabilities in Clinical Large Language Models,” delves into these privacy concerns. The research, conducted by a team including Alexander Nemecek, Zebin Yun, Zahra Rahmani, Yaniv Harel, Vipin Chaudhary, Mahmood Sharif, and Erman Ayday, investigates whether adversaries can infer if specific patient records were used during an LLM’s training process. This type of privacy breach is known as a Membership Inference Attack (MIA).

The study focused on Llemr, a state-of-the-art clinical question-answering model. The researchers evaluated both traditional loss-based attacks and a more innovative, domain-motivated paraphrasing-based perturbation strategy. This new approach aims to reflect more realistic adversarial conditions in a clinical setting, where an attacker might not have access to exact training data but could generate semantically similar queries.

In a Membership Inference Attack, an attacker tries to determine if a particular data record was part of the model’s training set. This is done by observing subtle differences in how the model behaves when presented with data it has seen versus data it hasn’t. In healthcare, a successful attack could reveal that a patient’s sensitive health record was used for model development, severely undermining trust in clinical AI.

Previous studies on MIAs often used generalized text perturbations that don’t fully capture the unique, structured nature of medical data. Clinical text has specific linguistic consistency, terminology, and semantic dependencies. To address this, the researchers introduced their paraphrase-based attack, which simulates a more realistic scenario where an adversary might use semantically similar, but not identical, patient-related queries.

The experiments utilized de-identified records from the publicly available MIMIC-IV dataset, ensuring no protected health information was accessed. The Llemr model, which is instruction-tuned to reason over EHRs, was chosen as a representative benchmark. The study employed a black-box threat model, meaning the adversary could query the model and observe its confidence scores (like negative log-likelihood or perplexity) but had no access to its internal gradients or parameters.

The attack methods included:

Loss Attack (Baseline)

This canonical attack measures if the model assigns a higher overall likelihood (lower negative log-likelihood) to examples it has memorized during training. It tests whether aggregate confidence is enough to distinguish members from non-members.

Paraphrased Loss Attack

Recognizing that real adversaries are unlikely to have verbatim training prompts, this attack uses semantically equivalent but lexically distinct queries, generated using ChatGPT-3.5-Turbo. It assesses if membership signals persist even when the input is paraphrased, simulating a more practical adversarial scenario.

Also Read:

Min-K% and Min-K%++ Attacks

These attacks investigate whether privacy leakage is localized to specific token positions within a response, rather than being distributed across the entire sequence. They focus on the tokens where the model is least confident, potentially revealing micro-level memorization.

The preliminary findings revealed limited but measurable membership leakage. The baseline Loss Attack achieved an AUC of 0.5392, and the Paraphrased Loss Attack showed similar performance with an AUC of 0.5397. While these AUC values are close to random guessing (0.5), the consistent separability across attacks indicates a non-zero privacy signal that warrants further investigation. The paraphrases used in the attack demonstrated high semantic fidelity, confirming that linguistic variation was introduced without altering medical meaning.

Interestingly, the Min-K% attacks, which target localized memorization, provided a more limited signal, suggesting that Llemr’s privacy exposure primarily stems from global confidence patterns rather than isolated token anomalies. This indicates that memorization is diffuse across the sequence rather than concentrated in rare lexical units.

Overall, the study suggests that current medical language models like Llemr exhibit partial resilience to standard membership inference techniques but remain susceptible to subtle privacy leakage. The small yet measurable differences in model confidence highlight the importance of ongoing assessment, especially with domain-specific models. The paraphrased attack is particularly significant because it reflects a more plausible threat model in real healthcare contexts.

These findings underscore the need for continued research into context-aware, domain-specific privacy evaluations and defenses, such as differential privacy fine-tuning and paraphrase-aware training. Such efforts are crucial to strengthen the security and trustworthiness of healthcare AI systems and to ensure public confidence in their adoption. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Privacy Risks: Membership Inference in Clinical AI Models

Loss Attack (Baseline)

Paraphrased Loss Attack

Min-K% and Min-K%++ Attacks

Gen AI News and Updates

Visier Unveils Model Context Protocol (MCP) for AI Agents to Govern People Data Across Enterprises

Nokod Security Unveils Adaptive Agent Security for Comprehensive AI Agent Protection

Jorie AI Unveils SmartCore Engine: Revolutionizing Healthcare Intelligence and Automation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates