TLDR: This research reveals that recommender systems powered by large language models (LLMs) are vulnerable to “inversion attacks.” Attackers can reconstruct sensitive user information, like personal preferences, interaction history, age, and gender, by analyzing the model’s output data (logits). The study introduces an optimized attack method that achieves high accuracy in recovering this private data, demonstrating that these systems leak significant user information regardless of their recommendation performance. The findings emphasize an urgent need for stronger privacy protections in LLM-based recommender systems.
Recommender systems have become an indispensable part of our daily online experience, guiding us to products, content, and services tailored to our tastes. With the advent of Large Language Models (LLMs), these systems have evolved, offering more nuanced and contextually relevant recommendations by processing user information and interactions through a linguistic framework. However, a recent study sheds light on a significant, previously underexplored vulnerability: the privacy risks associated with LLM-empowered recommender systems.
The research, titled Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective, reveals that these advanced recommendation engines are susceptible to what are known as “inversion attacks.” In simple terms, an inversion attack allows an adversary to reconstruct the original input prompts that contain highly sensitive user data, such as personal preferences, interaction histories, and even demographic attributes like age and gender, by exploiting the output data (logits) generated by the recommendation models.
Understanding the Threat
Traditionally, recommender systems relied on abstract, ID-based data. LLMs, however, integrate all information – system instructions, context, user profiles, and historical interactions – into natural language prompts. While this enhances personalization, it also means that sensitive user data is explicitly incorporated into these prompts. The study highlights that even though these complete prompts reside securely on the server side, the ‘logits’ (next-token probabilities) generated from these prompts are often sent back to users via API responses. An adversary can intercept these logits and, using sophisticated techniques, reverse-engineer them to reconstruct the original prompt, thereby exposing private user information.
This threat isn’t limited to external attackers. Malicious users could potentially access logits related to their own queries and recover underlying prompts, which might even reveal proprietary business insights about how the recommender system processes user information.
The Attack Method: Similarity-Guided Refinement
To systematically investigate this vulnerability, the researchers developed an optimized inversion framework. This framework leverages a ‘vec2text’ engine, which maps the model’s output logits back into potential textual prompts. A key innovation in their method is the “Similarity-Guided Refinement” procedure. This process iteratively refines candidate prompts by comparing their generated logits with the target logits using cosine similarity, selecting the candidate that best aligns with the original input until a high-fidelity reconstruction is achieved.
Key Findings and Implications
The experiments, conducted across movie and book recommendation domains using two representative LLM-based models (TallRec and CoLLM), yielded striking results:
- High-Fidelity Prompt Reconstruction: The optimized attack models demonstrated a strong ability to reconstruct prompts. In the best-case scenario, they could recover nearly 65% of user-interacted items.
- Sensitive Profile Recovery: User profile information, such as age and gender, was recovered with remarkable precision, with correct inferences in up to 87% of cases. This is likely due to the brevity and fixed structure of such demographic data within prompts, making their signals easier to learn and reconstruct.
- Domain Consistency Matters: The success of the attack was significantly higher in domains where the training data for the inversion model closely aligned with the target domain. For instance, movie titles were recovered more accurately than book titles, partly because movie titles tend to be shorter and there was greater overlap between training and test set vocabularies.
- Insensitivity to Model Performance: Surprisingly, the privacy leakage was largely insensitive to the victim recommendation model’s overall performance. Even when the recommender’s quality was intentionally degraded, the inversion attack remained effective, suggesting that logits continue to encode specific input details regardless of the system’s accuracy.
- Limitations with Prompt Length: A notable limitation observed was that the attack’s performance deteriorated as the prompt length increased. Longer, more complex prompts introduced greater semantic variability, making reconstruction more challenging.
These findings collectively expose critical privacy vulnerabilities in current LLM-empowered recommender systems. The ability to reconstruct sensitive user preferences and demographic data from model outputs poses a serious threat to user privacy and proprietary business information.
Also Read:
- Conversational Manipulation: A New Threat to AI Alignment
- Subtle Text Edits Can Cripple AI Knowledge Graphs: A New Threat to Graph-based RAG Systems
Moving Forward
This study serves as a crucial wake-up call for the research community and industry. It highlights the urgent need for developing robust defensive strategies to mitigate the risks of prompt inversion in LLM-empowered recommender systems. As AI continues to integrate more deeply into personalized services, ensuring the privacy and security of user data must be a paramount concern.


