Updating LLM Knowledge: A Two-Step Approach to Overcome Factual Conflicts

TLDR: A new “unlearn-then-learn” strategy, using the IA3 fine-tuning technique and circuit localization, allows for precise and localized knowledge editing in compact LLMs. This two-stage approach effectively suppresses old, conflicting facts while instilling new ones with high accuracy, significantly mitigating catastrophic forgetting and introducing “soft forgetting” for enhanced model control.

Large Language Models (LLMs) are incredibly powerful, but they have a significant challenge: their knowledge is static. This means they can’t easily update information, especially when new facts contradict what they already “know.” This often leads to two major problems: the model resists learning the new fact, and it might forget a lot of unrelated information, a phenomenon known as catastrophic forgetting.

A new research paper introduces an innovative solution called the “unlearn-then-learn” strategy. This approach aims to precisely edit knowledge within LLMs, particularly smaller ones like Microsoft’s Phi-3-mini-4k-instruct. The core idea is to first “unlearn” the old, conflicting information and then “learn” the new, correct fact. This two-stage process is made possible by a technique called Infused Adapter by Inhibiting and Amplifying Inner Activations (IA)3, which is a parameter-efficient fine-tuning (PEFT) method.

Understanding the Strategy

The “unlearn-then-learn” strategy begins with a crucial first step: circuit localization. Imagine an LLM’s brain as a complex network of pathways. This phase identifies the specific internal components or “circuits” responsible for encoding a particular piece of information. For example, if the model “knows” that PyTorch was developed by Meta AI, the researchers pinpointed the exact parts of the model that hold this information. This targeted approach is vital because it allows for very precise interventions, minimizing unintended side effects.

Once these circuits are identified, the strategy proceeds in two stages:

Stage 1: Unlearning the Old Fact. In this stage, the model is trained to suppress the original, conflicting information. Using IA3, the model is taught to respond with uncertainty or a refusal (e.g., “I am not sure who developed PyTorch.”) when asked about the old fact. This effectively neutralizes the deeply ingrained, incorrect knowledge without completely erasing it. The IA3 adapter, which contains the “unlearning” instructions, is then permanently merged into the model’s base weights, preparing it for the next stage.

Stage 2: Learning the New Fact. With the old fact suppressed, the model is now ready to learn the new, correct information. Using the same IA3 method, the model is trained to provide the target modulated fact (e.g., “Google.”) when asked the same question. Because the resistance from the old fact has been removed, the new information can be instilled much more effectively and without interference.

Also Read:

Key Findings and Benefits

The researchers conducted rigorous experiments on the Phi-3-mini-4k-instruct model. The results were highly impressive:

High Accuracy for New Facts: The model achieved a near-perfect 98.50% accuracy in adopting the new, modulated fact.
Effective Suppression of Old Facts: The original conflicting fact was effectively suppressed, with a 96.00% forget rate.
Unprecedented Localization: Crucially, the strategy dramatically mitigated catastrophic forgetting. While direct fine-tuning methods showed as low as 20% retention of unrelated knowledge, this “unlearn-then-learn” approach achieved a remarkable 72.00% retention rate for general knowledge. This means the edits were highly localized, preserving the vast majority of the model’s other capabilities.

The paper also introduces the concept of “soft forgetting.” This isn’t a complete erasure of the old knowledge but rather a controlled suppression. The original fact remains latent and can be conditionally accessed, enhancing model safety and control. This means the model won’t stubbornly stick to old, incorrect information but also won’t completely lose it, allowing for more nuanced responses if needed.

This research represents a significant step forward in managing knowledge within LLMs, especially compact ones. By combining mechanistic interpretability (understanding how the model works internally) with a two-stage “unlearn-then-learn” process using IA3, the authors have provided a powerful, efficient, and responsible solution for dynamic knowledge updates. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Updating LLM Knowledge: A Two-Step Approach to Overcome Factual Conflicts

Understanding the Strategy

Key Findings and Benefits

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates