TLDR: A new research paper introduces Continual Learning’s Effective Model Capacity (CLEMC), a dynamic measure of a neural network’s ability to learn new tasks without forgetting old ones. The study reveals that this capacity is non-stationary and diminishes as new task distributions differ from previous ones, leading to ‘catastrophic forgetting’ regardless of model architecture or optimization method. Extensive experiments across various neural networks, including large language models, validate these theoretical findings, highlighting the need for ‘capacity-conscious’ continual learning approaches.
In the rapidly evolving world of artificial intelligence, neural networks are becoming increasingly adept at learning complex tasks. However, a significant challenge persists: how to enable these networks to continuously learn new information without forgetting what they’ve already mastered. This fundamental problem is known as ‘catastrophic forgetting’ and is at the heart of continual learning (CL).
A new research paper, titled “On Understanding of the Dynamics of Model Capacity in Continual Learning,” by Supriyo Chakraborty of Capital One and Krishnan Raghavan of Argonne National Laboratory, delves deep into this issue. Their work introduces a novel concept called Continual Learning’s Effective Model Capacity (CLEMC), which offers a dynamic perspective on a neural network’s ability to adapt and retain knowledge over time. The core idea is that a network’s capacity isn’t static; it changes as it encounters new tasks, influencing the delicate balance between learning new information (plasticity) and retaining old information (stability).
The Stability-Plasticity Dilemma and CLEMC
The stability-plasticity dilemma is a central challenge in continual learning. Imagine a human learning to ride a bicycle, then a car. They don’t forget how to ride a bicycle when they learn to drive a car. Neural networks, however, often struggle with this, tending to overwrite old knowledge when new tasks are introduced. The authors propose CLEMC as a way to characterize how this balance point shifts dynamically. They developed a mathematical model, a difference equation, to describe the intricate interplay between the neural network itself, the incoming task data, and the optimization process used for learning.
A key finding from their theoretical analysis is that effective capacity, and by extension, the stability-plasticity balance point, is inherently non-stationary. This means it’s constantly changing. The research demonstrates that regardless of the neural network’s architecture or the optimization method used, a network’s ability to represent new tasks diminishes when the incoming task distributions are different from previous ones. Even small, constant changes in tasks can lead to a significant deterioration of the model’s capacity over time, potentially rendering it unusable for previously learned tasks.
Experimental Validation Across Diverse Models
To support their theoretical claims, the researchers conducted extensive experiments across a wide range of neural network architectures. They started with simpler models like feedforward networks (FNNs) and convolutional networks (CNNs), then scaled up to more complex graph neural networks (GNNs) and even large language models (LLMs) with millions of parameters. The datasets used varied from synthetic sine waves to image classification (Omniglot) and large-scale text datasets (RedPajama).
The experiments consistently confirmed the theoretical predictions. For instance, with FNNs and synthetic sine wave data, they observed that the network’s capacity diverged, meaning it became increasingly poor at representing tasks as new, slightly different tasks were introduced. This divergence was proportional to the degree of distribution shift in the new tasks. Even common continual learning techniques like Experience Replay (ER), which aims to mitigate forgetting by replaying old data, showed this deterioration, although regularization techniques could somewhat improve the behavior.
Similar trends were observed with CNNs on the Omniglot dataset and GNNs with synthetic graph data. Even for real-world benchmarks without artificial noise, the capacity steadily deteriorated, requiring larger and larger weight updates to maintain performance, indicating a struggle to reduce forgetting. For large language models (8M and 134M parameters), the study showed that capacity increased (indicating more forgetting) as new tasks arrived, even with ER. While larger models showed more resilience, the fundamental issue of increasing forgetting persisted.
Also Read:
- Deep Temporal Networks: New Theory Explains Generalization and Reveals Surprising Role of Data Dependencies
- LiLoRA: A New Approach to Efficient Continual Learning in Multimodal AI
Implications and Future Directions
This research highlights a critical gap in how model capacity has traditionally been viewed in continual learning. Instead of a fixed parameter, capacity is a dynamic entity influenced by the continuous stream of tasks and the network’s evolving weights. The authors suggest that future research could leverage this dynamic understanding to develop “capacity-conscious” continual learning algorithms. This would involve adding constraints to the optimization process to ensure that the change in capacity remains marginal, even as new tasks are learned.
Understanding how task ordering, model scale, and optimization techniques impact this dynamic capacity is crucial for building more robust and adaptable AI systems. This work provides a foundational mathematical framework to explore these complex interactions, paving the way for more efficient and effective continual learning strategies. You can find more details about their work in the full research paper available at arXiv:2508.08052.


