TLDR: A new research paper introduces two novel embedding techniques to improve Knowledge Tracing (KT) models, which predict student performance. The ‘Mask Label Method’ prevents label leakage by masking ground-truth labels for Knowledge Concepts (KCs) within the same question, ensuring models learn genuinely. ‘Recency Encoding’ captures the time elapsed since a KC’s last occurrence, modeling forgetting and repetition. These methods, applied at the embedding level, are computationally efficient and consistently boost prediction accuracy across various KT models and datasets, with the combined approach showing the best results.
Knowledge Tracing (KT) models are at the heart of intelligent tutoring systems, helping to predict how well a student will perform in the future based on their past interactions with learning materials. These models often rely on ‘Knowledge Concepts’ (KCs), which are the specific skills a student needs to master for each question or item. However, a significant challenge known as ‘label leakage’ has plagued many of these models, particularly when a single question involves multiple KCs.
Label leakage occurs when the input data inadvertently reveals the correct answer to the model. Imagine a question that tests both ‘addition’ and ‘subtraction’. If the model sees the correct answer for ‘addition’ before it’s asked to predict the answer for ‘subtraction’ within the same question, it might ‘cheat’ by inferring the correct answer rather than genuinely learning. This can lead to artificially inflated performance metrics, making models seem better than they truly are in real-world scenarios.
Researchers Yahya Badran and Christine Preisach have introduced a straightforward yet highly effective solution to tackle this problem, along with an innovative way to capture how learning changes over time. Their work, detailed in the paper “Enhancing Knowledge Tracing through Leakage-Free and Recency-Aware Embeddings”, focuses on modifying the input embeddings of KT models.
Preventing Label Leakage with a Mask
The first key innovation is the ‘Mask Label Method’. Inspired by techniques used in language models like BERT, this method introduces a special ‘MASK’ label. When a question has multiple KCs, the ground-truth labels for all but the very last KC are replaced with this MASK label during the construction of input embeddings. For example, if a question involves three KCs (c1, c2, c3) and has a true response ‘r’, the model would see (c1, MASK), (c2, MASK), and finally (c3, r). This ensures that the model cannot peek at future correct answers within the same question, forcing it to learn genuine relationships rather than exploiting unintended correlations.
This masking strategy is applied consistently during both training and inference, meaning no special handling is needed when the model is deployed. It’s also computationally efficient and can be easily integrated into various existing KT model architectures, making it a widely applicable solution.
Capturing Forgetting and Repetition with Recency Encoding
The second major contribution is ‘Recency Encoding’. Human learning isn’t just about mastering concepts; it’s also about forgetting and the impact of repetition. Traditional models often overlook this crucial aspect. Recency Encoding addresses this by explicitly telling the model how long it has been since a particular Knowledge Concept was last encountered by the student.
This ‘distance’ information is encoded using learnable Fourier features, which are flexible enough to generalize to new or rarely seen distances. Unlike standard positional encodings that simply indicate an item’s absolute position in a sequence, recency encoding provides a more pedagogically meaningful signal. It helps models understand that recent interactions might have a stronger influence on current performance, while older interactions might be subject to forgetting.
Improved Performance Across Models
The researchers demonstrated that incorporating these leakage-free and recency-aware embeddings consistently improves the prediction accuracy of popular KT models, including DKT, DKT+, AKT, and SAKT. Experiments on various benchmark datasets like ASSISTments2009, Riiid2020, Algebra2005, and Duolingo2018 showed significant gains. The improvements were particularly noticeable on datasets where questions typically involve a higher number of KCs, highlighting the effectiveness of the label leakage mitigation.
The combination of both the Mask Label method and Recency Encoding, especially when applied to the Attentive Knowledge Tracing (AKT) model (resulting in AKT-MLd), achieved the best overall performance. This suggests that both preventing leakage and explicitly modeling the recency of interactions are vital for building more accurate and robust knowledge tracing systems.
Also Read:
- New AI Model Learns Hidden Concepts to Improve Student Learning and Exercise Recommendations
- Enhancing Language Models with New Vocabulary for Specialized Tasks
A Step Towards More Reliable Learning Systems
By addressing label leakage and integrating recency information, this research offers practical and efficient enhancements to knowledge tracing models. These advancements lead to more reliable predictions of student performance, which can, in turn, power more effective and personalized intelligent tutoring systems. The methods are designed to be computationally light and broadly applicable, paving the way for their widespread adoption in educational technology.


