TLDR: Deep learning models often lose their ability to learn new information over time, a problem called Loss of Plasticity (LoP). This research mathematically defines LoP as models getting trapped in “manifolds” in their parameter space due to “frozen” (inactive) or “cloned” (redundant) units. While properties like simple representations help models generalize in static tasks, they hinder continuous learning. Normalization layers can prevent LoP, and injecting noise into the training process can help models escape these traps and regain their learning capacity.
Deep learning models have achieved remarkable success in various fields, from image recognition to natural language processing. However, this success often relies on a crucial assumption: that the data distribution remains consistent throughout training and deployment. In real-world scenarios, where environments are constantly changing and new information emerges, these models often struggle. This challenge is known as the Loss of Plasticity (LoP), a phenomenon where a model’s ability to learn new information degrades over time.
LoP is distinct from ‘catastrophic forgetting,’ where new learning overwrites old knowledge. Instead, LoP refers specifically to a diminished capacity to integrate novel information effectively. Common signs of LoP include exploding weight magnitudes, the emergence of ‘dead’ ReLU units (neurons that cease to activate), activation saturation, and a collapse in the effective diversity of features within the network, leading to redundant or ‘cloned’ units.
A recent research paper, titled ‘BARRIERS FOR LEARNING IN AN EVOLVING WORLD: MATHEMATICAL UNDERSTANDING OF LOSS OF PLASTICITY,’ delves into the fundamental mechanisms behind LoP. Authored by Amir Joudaki, Giulia Lanzillotta, Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh, Thomas Hofmann, Mehrdad Farajtabar, and Fartash Faghri, this work offers a first-principles investigation of LoP in gradient-based learning, grounding its analysis in dynamical systems theory. You can read the full paper here: arXiv:2510.00304.
The Core Problem: Trapped in Parameter Space
The researchers formally define LoP by identifying ‘stable manifolds’ in the network’s parameter space. Imagine the network’s learning process as a journey through a vast landscape. These manifolds are like valleys or traps that gradient-based optimization trajectories can fall into, making it difficult for the model to explore new learning pathways. The paper identifies two primary mechanisms that create these traps:
- Frozen Units: This occurs when neuron activations become saturated, meaning their output is consistently at an extreme (e.g., always 0 or always 1 for certain activation functions). When a unit is ‘frozen,’ its incoming parameters effectively stop updating, trapping the learning process.
- Cloned Units: This mechanism arises from representational redundancy. If multiple units in a network learn to perform identical functions, they become ‘cloned.’ This redundancy reduces the network’s effective dimensionality, limiting its capacity to form new, diverse representations.
A key insight from this research is a fundamental tension: properties that are beneficial for generalization in static learning environments, such as low-rank representations and simplicity biases (where models prefer simpler explanations), directly contribute to LoP in continual learning scenarios. While these properties help models perform well on a fixed dataset, they limit the network’s ability to adapt to novel information over time.
Why LoP Emerges: The Dynamics of Compression
The training process of deep neural networks often involves two phases: an initial expansion of representational diversity, where features become decorrelated, followed by a compression phase. In this compression phase, the network simplifies its representations, retaining only the most relevant features for the task. This drive towards low-dimensional structures, often seen in phenomena like ‘neural collapse,’ pushes the model towards the LoP manifolds. The paper’s theoretical analysis shows how nonlinear activation functions, while initially promoting diversity, can also lead to saturation and the emergence of frozen units when pre-activations drift into extreme ranges. Similarly, the pursuit of low-rank features can encourage the creation of duplicate units.
Strategies for Prevention and Recovery
Understanding the causes of LoP is the first step toward mitigating it. The paper explores several strategies:
- Preventing LoP with Normalization: Techniques like Batch Normalization (BN) and Layer Normalization (LN) play a crucial role. By standardizing the statistics of neuron pre-activations, these layers help keep activations within their dynamic, non-linear range, preventing saturation and maintaining a higher effective rank of representations.
- Recovery from LoP via Perturbations: If LoP conditions have already set in, proactive measures like normalization might not be enough. The research suggests that injecting noise into the training process can be a viable recovery strategy. This is because LoP manifolds are often unstable or saddle-like, meaning a small perturbation can help the optimizer escape the trap. Examples include Noisy SGD (adding Gaussian noise to gradients) and Continual Backpropagation (CBP), which can actively restore representational diversity and reduce training loss. Dropout, which randomly deactivates neurons, can also help break symmetries that lead to cloning, though its effectiveness can vary depending on the context.
Also Read:
- Unlocking Learning Dynamics in State Space Models: The Crucial Role of Memory Initialization
- Learning Progress Monitoring: A New Approach to Noise-Robust Exploration in AI
Looking Ahead
This work provides a robust mathematical framework for understanding LoP, highlighting that the very mechanisms promoting generalization in static settings can become detrimental in dynamic, continual learning environments. The findings underscore the need for new architectures and learning algorithms that can actively preserve or regenerate representational diversity to sustain plasticity indefinitely. Future research will explore non-linear LoP manifolds, delve deeper into the stability conditions of these manifolds, and investigate whether models can fully restore their exploratory capacity after recovering from an LoP state.
Ultimately, overcoming LoP is vital for building truly lifelong learning AI systems that can adapt robustly in an ever-changing world.


