spot_img
HomeResearch & DevelopmentUnmasking Privacy Risks: Membership Inference in Clinical AI Models

Unmasking Privacy Risks: Membership Inference in Clinical AI Models

TLDR: A new study explores membership inference vulnerabilities in clinical large language models (LLMs), specifically focusing on Llemr, a clinical question-answering model. Researchers found limited but measurable privacy leakage, even with a novel paraphrase-based attack that simulates realistic adversarial conditions. The findings highlight that while clinical LLMs show some resilience, they remain susceptible to subtle privacy risks, emphasizing the need for advanced privacy evaluations and defense mechanisms in healthcare AI.

As large language models (LLMs) become increasingly integrated into critical healthcare systems, from clinical decision support to patient information management, ensuring their privacy and trustworthiness is paramount. These powerful AI tools are often fine-tuned on sensitive electronic health record (EHR) data to enhance their performance in medical contexts. While this improves their domain-specific capabilities, it also introduces a significant risk: the potential exposure of patient information through the model’s behavior.

A recent work-in-progress study, titled “Exploring Membership Inference Vulnerabilities in Clinical Large Language Models,” delves into these privacy concerns. The research, conducted by a team including Alexander Nemecek, Zebin Yun, Zahra Rahmani, Yaniv Harel, Vipin Chaudhary, Mahmood Sharif, and Erman Ayday, investigates whether adversaries can infer if specific patient records were used during an LLM’s training process. This type of privacy breach is known as a Membership Inference Attack (MIA).

The study focused on Llemr, a state-of-the-art clinical question-answering model. The researchers evaluated both traditional loss-based attacks and a more innovative, domain-motivated paraphrasing-based perturbation strategy. This new approach aims to reflect more realistic adversarial conditions in a clinical setting, where an attacker might not have access to exact training data but could generate semantically similar queries.

In a Membership Inference Attack, an attacker tries to determine if a particular data record was part of the model’s training set. This is done by observing subtle differences in how the model behaves when presented with data it has seen versus data it hasn’t. In healthcare, a successful attack could reveal that a patient’s sensitive health record was used for model development, severely undermining trust in clinical AI.

Previous studies on MIAs often used generalized text perturbations that don’t fully capture the unique, structured nature of medical data. Clinical text has specific linguistic consistency, terminology, and semantic dependencies. To address this, the researchers introduced their paraphrase-based attack, which simulates a more realistic scenario where an adversary might use semantically similar, but not identical, patient-related queries.

The experiments utilized de-identified records from the publicly available MIMIC-IV dataset, ensuring no protected health information was accessed. The Llemr model, which is instruction-tuned to reason over EHRs, was chosen as a representative benchmark. The study employed a black-box threat model, meaning the adversary could query the model and observe its confidence scores (like negative log-likelihood or perplexity) but had no access to its internal gradients or parameters.

The attack methods included:

Loss Attack (Baseline)

This canonical attack measures if the model assigns a higher overall likelihood (lower negative log-likelihood) to examples it has memorized during training. It tests whether aggregate confidence is enough to distinguish members from non-members.

Paraphrased Loss Attack

Recognizing that real adversaries are unlikely to have verbatim training prompts, this attack uses semantically equivalent but lexically distinct queries, generated using ChatGPT-3.5-Turbo. It assesses if membership signals persist even when the input is paraphrased, simulating a more practical adversarial scenario.

Also Read:

Min-K% and Min-K%++ Attacks

These attacks investigate whether privacy leakage is localized to specific token positions within a response, rather than being distributed across the entire sequence. They focus on the tokens where the model is least confident, potentially revealing micro-level memorization.

The preliminary findings revealed limited but measurable membership leakage. The baseline Loss Attack achieved an AUC of 0.5392, and the Paraphrased Loss Attack showed similar performance with an AUC of 0.5397. While these AUC values are close to random guessing (0.5), the consistent separability across attacks indicates a non-zero privacy signal that warrants further investigation. The paraphrases used in the attack demonstrated high semantic fidelity, confirming that linguistic variation was introduced without altering medical meaning.

Interestingly, the Min-K% attacks, which target localized memorization, provided a more limited signal, suggesting that Llemr’s privacy exposure primarily stems from global confidence patterns rather than isolated token anomalies. This indicates that memorization is diffuse across the sequence rather than concentrated in rare lexical units.

Overall, the study suggests that current medical language models like Llemr exhibit partial resilience to standard membership inference techniques but remain susceptible to subtle privacy leakage. The small yet measurable differences in model confidence highlight the importance of ongoing assessment, especially with domain-specific models. The paraphrased attack is particularly significant because it reflects a more plausible threat model in real healthcare contexts.

These findings underscore the need for continued research into context-aware, domain-specific privacy evaluations and defenses, such as differential privacy fine-tuning and paraphrase-aware training. Such efforts are crucial to strengthen the security and trustworthiness of healthcare AI systems and to ensure public confidence in their adoption. For more details, you can read the full research paper here.

Dev Sundaram
Dev Sundaramhttps://blogs.edgentiq.com
Dev Sundaram is an investigative tech journalist with a nose for exclusives and leaks. With stints in cybersecurity and enterprise AI reporting, Dev thrives on breaking big stories—product launches, funding rounds, regulatory shifts—and giving them context. He believes journalism should push the AI industry toward transparency and accountability, especially as Generative AI becomes mainstream. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -