TLDR: The paper introduces KCMP, the first black-box membership inference attack framework for Large Vision-Language Models (LVLMs). It addresses the challenge of distinguishing memorized training data from generalized knowledge by creating “confounding” visual tasks and calibrating them with prior knowledge. KCMP effectively identifies training data using only textual outputs, outperforming existing black-box methods and achieving performance comparable to gray-box attacks across various LVLMs and datasets, highlighting its practical utility for auditing data privacy in real-world AI systems.
Large Vision-Language Models (LVLMs) are powerful AI systems that combine visual and textual understanding, trained on massive amounts of data. While these models offer incredible capabilities, their extensive training can lead to a significant privacy concern: memorization of their training data. This means that sensitive information, like personal photos or proprietary content, might inadvertently become embedded within the model, raising questions about data rights and security.
Membership Inference Attacks (MIAs) are a well-established technique designed to determine if a specific data sample was part of a model’s training dataset. For LVLMs, extending these attacks is crucial for safeguarding user privacy. However, most existing MIA methods for LVLMs operate under ‘white-box’ or ‘gray-box’ assumptions, meaning they require access to the model’s internal features, such as likelihoods or confidence scores. In real-world scenarios, mainstream LVLMs are often deployed as ‘black-box’ services, only exposing generated outputs and keeping internal computations hidden. This makes traditional MIA methods largely inapplicable.
A new research paper, titled “Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing,” introduces the first black-box MIA framework specifically for LVLMs. Authored by Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, and Tao Qi, this work tackles the challenging problem of identifying training data in LVLMs when only their textual outputs are accessible. You can read the full paper here: Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing.
The Challenge of Black-Box Attacks
The core difficulty in black-box MIAs for LVLMs lies in distinguishing between a model’s memorization of specific training data and its general knowledge. For example, an LVLM might confidently describe a common scene like a “sun rising from the sea” even if it wasn’t in its training data, simply because it has vast general world knowledge. The goal is to find instances where the model’s confidence stems from having seen that exact data during training, rather than from its broad understanding.
Introducing KCMP: Knowledge-Calibrated Memory Probing
The proposed framework, Knowledge-Calibrated Memory Probing (KCMP), addresses this by evaluating the model’s confidence at a fine-grained level. Instead of looking at overall image descriptions, KCMP breaks down visual inputs into independent semantic units, like individual objects. The framework has three main components:
1. Semantic Mask Prediction Task Construction: This involves identifying salient objects in an image and masking their shape or color. The model is then asked to predict the masked content from a set of semantically confusing alternatives. These tasks are designed to be difficult to solve using general knowledge alone, forcing the model to rely on memorized details.
2. Prior Knowledge Calibration: To further refine the tasks, KCMP uses a calibration mechanism. It estimates the semantic relevance of objects to an image’s description and assesses the ‘rationality’ of the confusing alternatives using a powerful language model. Tasks that could easily be solved by general knowledge are filtered out, ensuring that the remaining tasks are more indicative of true memorization.
3. Instruction-based Model Confidence Evaluation: The target LVLM is then prompted to complete these calibrated mask prediction tasks. If the LVLM shows abnormally high prediction accuracy or confidence on a specific task, it suggests strong memorization of that visual input, indicating it was likely part of the training data.
Also Read:
- Uncovering Hidden Privacy Risks: Entity-Level Membership Inference in Large Language Models
- New Attack Method Boosts Adversarial Transferability in Visual-Language AI Models
Empirical Success and Practicality
The researchers conducted extensive experiments across four LVLMs (MiniGPT-4, LLaVA 1.5, LLaMA Adapter v2, and a DAM-based model) and three datasets. KCMP consistently outperformed existing black-box MIA methods, such as Image Infer, and remarkably, achieved performance comparable to, and in some cases even surpassed, gray-box methods that have access to internal model features. This demonstrates KCMP’s effectiveness in a purely black-box setting.
Further analysis revealed that KCMP is particularly effective against models trained with fine-grained, region-level supervision, as these models tend to disproportionately memorize explicitly annotated regions. The study also showed that combining both object and color probes yields better results, as they capture complementary memorization signals. The framework is also robust to varying sampling temperatures and can be made more efficient by using fewer repetitions of queries or lightweight segmentation models without significant loss in accuracy.
KCMP offers a compelling balance between practicality and effectiveness. Its ability to detect training data membership using only textual outputs makes it a strong candidate for auditing data exposure in real-world, closed-source AI systems, helping to address critical privacy concerns in the rapidly evolving field of large vision-language models.


