TLDR: SIMU (Selective Influence Machine Unlearning) is a new two-step framework for Large Language Models (LLMs) that improves unlearning of sensitive information. It first identifies “critical neurons” responsible for encoding the forget-set data and then selectively updates only these neurons and attention layers using a second-order optimizer. This method achieves effective unlearning while significantly better preserving the model’s original knowledge and utility compared to previous approaches.
The rapid advancement of Large Language Models (LLMs) has brought incredible capabilities, but also significant concerns regarding the memorization of sensitive or unwanted information. To address this, a new approach called Selective Influence Machine Unlearning (SIMU) has been introduced. This framework aims to make LLMs forget specific information without compromising their overall knowledge and utility.
Traditional machine unlearning methods, especially those based on first-order and second-order optimizers, often face a challenge: while they successfully erase targeted data, they can also degrade the model’s ability to perform other tasks it was originally trained for. This “over-forgetting” leads to a loss of valuable knowledge and reduced model utility. SIMU tackles this by being more precise in its unlearning process.
How SIMU Works: A Two-Step Approach
SIMU operates in two main steps. First, it identifies “critical neurons” within the LLM’s Multi-Layer Perceptron (MLP) layers. These are the specific neurons most responsible for encoding the information that needs to be forgotten. The paper explains that MLP layers act like a key-value memory, storing factual knowledge, while attention layers handle contextual relationships. By focusing on MLP neurons, SIMU aims for fine-grained control over information editing. This identification process uses a gradient-aggregation approach to calculate an “attribution score” for each neuron, indicating its contribution to the forget-set. Neurons with scores above a certain threshold are then marked as critical.
The second step involves selective unlearning. Instead of broadly updating the entire model, SIMU constrains the updates to only these identified critical neurons and the attention layers. All other parameters in the model are kept frozen. This targeted approach uses a second-order iterative framework, specifically fine-tuning with the Sophia optimizer. The Sophia optimizer is chosen because it efficiently approximates the Hessian matrix (a second-order derivative), which is crucial for precise updates. By applying a binary mask during the update process, SIMU ensures that parameter changes are restricted to the critical MLP neurons, effectively removing the targeted information while minimizing “collateral damage” to the model’s retained knowledge.
Also Read:
- Balancing Data Deletion and Model Integrity in Federated Learning
- Strategies for Skill Acquisition in Large Multimodal Models
Demonstrated Effectiveness
Experiments were conducted on two unlearning benchmarks, TOFU (a fictitious-unlearning task) and LUME (a multi-faceted benchmark), using LLaMA2-7B and OLMo-1B models. SIMU consistently outperformed existing baselines like FO-GradDiff and SO-GradDiff, demonstrating higher model utility preservation while maintaining comparable unlearning efficacy. For instance, on OLMo-1B, SIMU showed a 1-2% utility improvement over SO-GradDiff, and for LLaMA2-7B, the improvement was even more significant, around 5-6%. This suggests that focusing updates on specific model components is key to balancing effective forgetting with utility preservation.
The research also delved into the impact of various hyperparameters, such as the number of attribution calculation steps and the attribution threshold, on SIMU’s performance. They found that a moderate number of steps (3 or 5) offered the best balance between computational efficiency and accuracy, and that higher thresholds (leading to fewer, more influential critical neurons) generally improved performance by reducing interference with general model behavior. Interestingly, the study also explored “dual neurons” – those encoding information relevant to both the forget and retain sets – finding that including them in the masking scheme led to better unlearning performance, suggesting a nuanced role for these neurons in the model’s semantic representation space.
In conclusion, SIMU presents a significant advancement in machine unlearning for LLMs. By intelligently identifying and selectively updating only the neurons responsible for sensitive information, it offers a powerful framework to enhance model safety and privacy without sacrificing the model’s core capabilities. You can read the full research paper here: SIMU: Selective Influence Machine Unlearning.


