TLDR: This research introduces a method to simulate user behavior in recommendation systems using Small Language Models (SLMs) and low-rank adapters (LoRAs). It converts user interactions into textual profiles and explanations, then clusters users into “personas.” SLMs are fine-tuned with a specific LoRA for each persona, leveraging both short-term (user profile) and long-term (enriched interactions) memories. Experiments show this approach is effective and scalable, outperforming larger, non-fine-tuned LLMs and balancing personalization with computational efficiency.
A long-standing challenge in developing accurate recommendation models is effectively simulating user behavior. This is primarily due to the complex and often unpredictable nature of how users interact with systems. While Large Language Models (LLMs) have shown promise in this area, they often face hurdles in efficiently processing vast amounts of user interaction data, adapting to specific user knowledge, and scaling these capabilities for millions of users.
This new research introduces an innovative approach that shifts the focus from complex LLM prompting or extensive fine-tuning to leveraging Small Language Models (SLMs). The goal is to create cost-effective and resource-efficient user agents capable of mimicking real user behaviors. The core of this method involves extracting robust textual representations of user preferences using a frozen LLM, and then fine-tuning SLMs with low-rank adapters (LoRAs) to simulate these behaviors.
A Three-Stage Methodology for User Agents
The proposed methodology unfolds in three distinct stages. First, the system transforms large volumes of user interactions into meaningful textual representations. This includes generating a ‘User Profile’ (acting as short-term memory, Ms) that describes general user traits, and ‘Enriched User Interaction’ (long-term memory, Ml) which explains the rationale behind a user’s likes or dislikes for specific items. This distillation process is powered by an LLM, such as GPT-4o, incorporating self-reflection to refine these representations.
Second, users are grouped into ‘personas’ based on their profile embeddings. Instead of the computationally intensive task of training a separate LoRA for every individual user, the researchers train a single low-rank adapter for each persona. This strategic grouping helps achieve an optimal balance between personalized user simulation and the overall scalability and performance of the user behavior agents. The base SLM weights remain frozen during this fine-tuning process.
Finally, these persona-level SLMs, now equipped with their specialized LoRAs, are utilized to build user agents. These agents effectively use both their short-term (user profile) and long-term (enriched interactions) memories to predict user preferences, such as movie ratings. The paper suggests that this SLM fine-tuning approach is more effective and scalable for real-world applications compared to traditional Retrieval Augmented Generation (RAG) systems.
Empirical Evidence and Key Findings
Experiments conducted using the MovieLens-1M dataset involved 200 users, who were clustered into 4 distinct personas. The Phi-3-Mini-4k-Instruct SLM, a model with 3.8 billion parameters, was fine-tuned using low-rank adapters. The results provide compelling evidence that SLMs, when fine-tuned with low-rank adapters, can match or even exceed the performance of larger, frozen LLMs in building personalized agents. The inclusion of long-term memories (Ml) alongside short-term memories (Ms) generally led to improved performance, indicated by reduced Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).
The research highlights several key contributions: a hierarchical knowledge distillation process that converts tabular user interactions into rich textual profiles and explanations; the demonstration that low-rank adaptation of SLMs can achieve high performance for personalized agents, especially when combined with Retrieval Augmented Fine-tuning (RAFT) for memory utilization; and the effectiveness of clustering users into personas to balance personalization quality with the number of model parameters required.
Also Read:
- Building Representative Digital Societies with Language Models
- Optimizing LLM Performance: Balancing Speed and Cost with Dynamic Compute Allocation
Future Directions and Impact
While the findings are promising, the paper also acknowledges certain limitations and outlines future research directions. The LLM-dependent distillation process can be slow for very large datasets. Hyperparameter tuning for fine-tuning remains computationally intensive, suggesting that exploring other parameter-efficient methods could yield further improvements. Additionally, optimizing persona generation by incorporating more easily acquired user features is an area for future work. This research is expected to pave the way for more scalable, personalized user interaction systems, particularly in scenarios where users have extensive interaction histories. You can find more details in the full research paper here.


