spot_img
HomeResearch & DevelopmentCURLL: A Framework for Language Models to Learn Like...

CURLL: A Framework for Language Models to Learn Like Humans

TLDR: CURLL is a new benchmark and dataset designed to evaluate continual learning in language models, inspired by human developmental trajectories from ages 5-10. It features a skill graph mapping dependencies and a 23.4-billion-token synthetic dataset with controlled skill progression. This framework enables detailed analysis of skill acquisition, transfer, and forgetting, revealing insights into how AI can learn continuously without losing past knowledge.

Imagine a child learning new things every day, building on what they already know without forgetting their ABCs. This natural, continuous learning is a hallmark of human intelligence. Now, imagine if our advanced AI models, like large language models (LLMs), could do the same. Currently, once an LLM is trained, its knowledge becomes static, frozen in time. This is a significant limitation in a world where information constantly evolves.

The Challenge of Continual Learning in AI

The ability for AI systems to continuously acquire, integrate, and refine knowledge over long periods without losing previous capabilities is known as continual learning. It’s one of the biggest hurdles to achieving human-like artificial intelligence. Existing methods for evaluating continual learning in LLMs often fall short. They lack precise control over the specific skills being tested, don’t clearly model how different skills depend on each other, and struggle to accurately measure how much a model forgets when learning new information.

Introducing CURLL: A Human-Inspired Benchmark

To address these gaps, researchers have introduced CURLL (Continual Learning in Language Models), a new dataset and benchmark designed to evaluate how language models learn progressively. What makes CURLL unique is its foundation in human developmental trajectories, specifically mirroring how children learn from ages 5 to 10. This framework allows for a systematic and detailed assessment of an AI model’s ability to acquire new skills over time.

How CURLL Works

CURLL is structured around five developmental stages (0-4), each representing a year of human learning. It incorporates a detailed skill graph that breaks down broad skills (like Mathematics or Language) into smaller abilities, concrete goals, and measurable indicators. Crucially, this graph also maps out which abilities are prerequisites for others, capturing the natural dependencies in learning.

To power this evaluation, CURLL uses a massive 23.4-billion-token synthetic dataset. This dataset is carefully generated with controlled skill progression, vocabulary complexity, and diverse formats, including paragraphs, comprehension-based questions (CQA), skill-testing questions (CSQA), and instruction-response (IR) pairs. The stage-wise token counts range from 2.12 billion to 6.78 billion, allowing for precise analysis of how models forget old skills, transfer knowledge to new ones (forward transfer), and retain previous knowledge while learning new tasks (backward transfer).

Building the Dataset

The framework for CURLL is grounded in established educational curricula: the Early Learning Outcomes Framework (ELOF) for children up to age 5, and the Cambridge curriculum for ages 5-10. These frameworks help define fine-grained skills, sub-skills, goals, and indicators. The skill graph, a critical component, uses these indicators as nodes and connects them with weighted edges to show prerequisite relationships, essentially mapping how skills build upon each other. An LLM is used to predict these dependencies.

The synthetic data is generated by prompting an LLM with a ‘seed’ that includes a skill-tuple, an age-appropriate vocabulary word (sampled from Age-of-Acquisition data), and a specific instance type (IR, CQA, or CSQA). This ensures diversity and coverage, with the generated content reflecting the complexity and themes appropriate for each developmental stage. The dataset has been verified for diversity and shows a clear progression in readability as stages advance, mimicking real-world learning.

Initial Findings

Preliminary experiments using a 135-million-parameter transformer model trained under independent, joint, and sequential (continual) setups revealed interesting trade-offs. While models trained continually showed better generalization to later stages, their performance on previously learned stages sometimes degraded, illustrating the challenge of catastrophic forgetting. The skill graph proved invaluable in interpreting these results, showing that skills with fewer outgoing dependencies (meaning they are less foundational for future skills) were more vulnerable to forgetting.

Also Read:

The Future of Continual Learning Evaluation

CURLL offers a powerful diagnostic tool for understanding and solving the continual learning problem in language models. Its fine-grained control over skills and data allows researchers to evaluate sample efficiency, measure how learning one skill impacts another, and analyze forgetting at a much deeper level than traditional benchmarks. This framework can be extended to cover older age groups and provides a controlled setting for continual pretraining research.

While the current work uses synthetic data and a smaller model, which are acknowledged limitations, CURLL represents a significant step forward in advancing continual learning evaluations for language models by mirroring human learning patterns and providing explicit control over skill dependencies. You can find more details about this research paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -