TLDR: This paper investigates how large language models (LLMs) memorize training data during fine-tuning for domain adaptation and instruction tuning. It introduces an n-gram memorization score that reliably predicts verbatim memorization, enabling effective early stopping. Additionally, a novel n-gram-aware loss regularizer is proposed, significantly reducing memorization (up to 40%) with minimal performance loss, offering scalable and practical mitigation strategies.
Large Language Models (LLMs) have demonstrated incredible capabilities across a vast array of tasks, but their ability to sometimes reproduce parts of their training data verbatim raises significant concerns about privacy and copyright. While much attention has been paid to memorization during the initial pre-training phase, the dynamics of memorization during fine-tuning—especially for domain adaptation and instruction tuning—have remained less explored.
A recent research paper, Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning, by Dean L. Slack and Noura Al Moubayed from Durham University, U.K., delves into this critical area. The study investigates how models like Pythia, Llama3, and Mistral, ranging from 1.4 billion to 70 billion parameters, memorize data when fine-tuned on common evaluation datasets.
The Problem of Early Memorization
The researchers observed a striking trend: memorization increases dramatically in the initial epochs of fine-tuning. This often happens well before the model achieves optimal performance, whether measured by validation perplexity or task-specific evaluation. This early memorization suggests that LLMs quickly absorb new information, potentially sensitive data, before traditional early stopping criteria would typically halt training.
N-gram Memorization: An Early Warning System
To address this, the paper introduces a simple yet highly effective n-gram memorization score. This score acts as a reliable precursor to verbatim memorization, meaning it can signal that a sample is at high risk of being fully memorized before it actually happens. By tracking this n-gram score throughout training, it becomes possible to identify and intervene earlier.
The study found a clear distinction in partial memorization scores between samples that eventually become memorized and those that do not. This gap was particularly pronounced in instruction-following and summarization datasets, which often contain repetitive or templated phrases. Larger models also showed a greater increase in this partial memorization score, indicating its scalability as a predictive metric.
Optimal Stopping Criteria and Mitigation Strategies
Leveraging the n-gram memorization score, the researchers explored its use as an early-stopping criterion. They found that stopping fine-tuning when the average partial memorization score on the training set reached a certain threshold (e.g., 20) significantly reduced memorization rates with minimal impact on the model’s overall performance. This approach offered a better balance between reducing memorization and maintaining performance compared to traditional early stopping based on validation perplexity or task accuracy.
Beyond early stopping, the paper introduces a novel n-gram-aware loss regularizer. This technique modifies the standard causal language modeling loss function to penalize n-grams that the fine-tuned model assigns excessively high confidence to, especially when that confidence significantly exceeds the pre-trained model’s confidence. This regularizer proved to be even more effective, reducing memorization across all tested model families by up to 40% while minimizing performance trade-offs. It also outperformed an existing memorization mitigation strategy called Goldfish loss regularisation on most models.
Impact of Model Size and Data Categories
The research confirmed that memorization generally increases with model size, posing greater challenges for larger models like Llama3 70B. However, both the n-gram early stopping and the n-gram regularizer consistently reduced memorization across all model scales.
A categorical analysis of memorized n-grams revealed that certain types of content are more prone to memorization. Medical, question-answering, and entity-related phrases showed the highest risk, likely due to their highly templated and repetitive nature. In contrast, free-form prose found in financial news or reviews exhibited lower memorization rates.
Also Read:
- Adapting Knowledge Distillation for Efficient Large Language Models
- Unmasking Hidden Training Data in LLMs After Reinforcement Learning
Practical Implications
This research provides practical and scalable insights into managing memorization during the fine-tuning of large language models. By understanding the dynamics of memorization and implementing strategies like n-gram-based early stopping or loss regularization, developers can significantly mitigate privacy and security risks associated with LLMs, making them safer and more reliable for real-world applications.


