CURLL: A Framework for Language Models to Learn Like Humans

TLDR: CURLL is a new benchmark and dataset designed to evaluate continual learning in language models, inspired by human developmental trajectories from ages 5-10. It features a skill graph mapping dependencies and a 23.4-billion-token synthetic dataset with controlled skill progression. This framework enables detailed analysis of skill acquisition, transfer, and forgetting, revealing insights into how AI can learn continuously without losing past knowledge.

Imagine a child learning new things every day, building on what they already know without forgetting their ABCs. This natural, continuous learning is a hallmark of human intelligence. Now, imagine if our advanced AI models, like large language models (LLMs), could do the same. Currently, once an LLM is trained, its knowledge becomes static, frozen in time. This is a significant limitation in a world where information constantly evolves.

The Challenge of Continual Learning in AI

The ability for AI systems to continuously acquire, integrate, and refine knowledge over long periods without losing previous capabilities is known as continual learning. It’s one of the biggest hurdles to achieving human-like artificial intelligence. Existing methods for evaluating continual learning in LLMs often fall short. They lack precise control over the specific skills being tested, don’t clearly model how different skills depend on each other, and struggle to accurately measure how much a model forgets when learning new information.

Introducing CURLL: A Human-Inspired Benchmark

To address these gaps, researchers have introduced CURLL (Continual Learning in Language Models), a new dataset and benchmark designed to evaluate how language models learn progressively. What makes CURLL unique is its foundation in human developmental trajectories, specifically mirroring how children learn from ages 5 to 10. This framework allows for a systematic and detailed assessment of an AI model’s ability to acquire new skills over time.

How CURLL Works

CURLL is structured around five developmental stages (0-4), each representing a year of human learning. It incorporates a detailed skill graph that breaks down broad skills (like Mathematics or Language) into smaller abilities, concrete goals, and measurable indicators. Crucially, this graph also maps out which abilities are prerequisites for others, capturing the natural dependencies in learning.

To power this evaluation, CURLL uses a massive 23.4-billion-token synthetic dataset. This dataset is carefully generated with controlled skill progression, vocabulary complexity, and diverse formats, including paragraphs, comprehension-based questions (CQA), skill-testing questions (CSQA), and instruction-response (IR) pairs. The stage-wise token counts range from 2.12 billion to 6.78 billion, allowing for precise analysis of how models forget old skills, transfer knowledge to new ones (forward transfer), and retain previous knowledge while learning new tasks (backward transfer).

Building the Dataset

The framework for CURLL is grounded in established educational curricula: the Early Learning Outcomes Framework (ELOF) for children up to age 5, and the Cambridge curriculum for ages 5-10. These frameworks help define fine-grained skills, sub-skills, goals, and indicators. The skill graph, a critical component, uses these indicators as nodes and connects them with weighted edges to show prerequisite relationships, essentially mapping how skills build upon each other. An LLM is used to predict these dependencies.

The synthetic data is generated by prompting an LLM with a ‘seed’ that includes a skill-tuple, an age-appropriate vocabulary word (sampled from Age-of-Acquisition data), and a specific instance type (IR, CQA, or CSQA). This ensures diversity and coverage, with the generated content reflecting the complexity and themes appropriate for each developmental stage. The dataset has been verified for diversity and shows a clear progression in readability as stages advance, mimicking real-world learning.

Initial Findings

Preliminary experiments using a 135-million-parameter transformer model trained under independent, joint, and sequential (continual) setups revealed interesting trade-offs. While models trained continually showed better generalization to later stages, their performance on previously learned stages sometimes degraded, illustrating the challenge of catastrophic forgetting. The skill graph proved invaluable in interpreting these results, showing that skills with fewer outgoing dependencies (meaning they are less foundational for future skills) were more vulnerable to forgetting.

Also Read:

The Future of Continual Learning Evaluation

CURLL offers a powerful diagnostic tool for understanding and solving the continual learning problem in language models. Its fine-grained control over skills and data allows researchers to evaluate sample efficiency, measure how learning one skill impacts another, and analyze forgetting at a much deeper level than traditional benchmarks. This framework can be extended to cover older age groups and provides a controlled setting for continual pretraining research.

While the current work uses synthetic data and a smaller model, which are acknowledged limitations, CURLL represents a significant step forward in advancing continual learning evaluations for language models by mirroring human learning patterns and providing explicit control over skill dependencies. You can find more details about this research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CURLL: A Framework for Language Models to Learn Like Humans

The Challenge of Continual Learning in AI

Introducing CURLL: A Human-Inspired Benchmark

How CURLL Works

Building the Dataset

Initial Findings

The Future of Continual Learning Evaluation

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Google Unveils Free 5-Day AI Agents Intensive Course on Kaggle

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates