TLDR: This research introduces a hybrid framework for wearable Human Activity Recognition (HAR) that combines a generalized model with rapid, on-device few-shot learning. The system, implemented on an energy-efficient GAP9 microcontroller, first learns broad activity patterns and then quickly adapts to individual users by updating only a small part of the model. This approach significantly improves recognition accuracy (e.g., up to 17.38% for certain datasets) with minimal computational and energy overhead, making personalized and scalable HAR feasible on resource-constrained wearable devices.
Wearable devices have become an integral part of our lives, helping us track fitness, monitor health, and even assist with daily tasks through Human Activity Recognition (HAR). These systems rely on sensors in smartwatches and fitness bands to understand our movements and gestures. While deep learning has significantly advanced HAR, a persistent challenge remains: models trained on a broad population often struggle to perform accurately when deployed to new, individual users. This performance drop is primarily due to what researchers call user-induced concept drift (UICD), where each person has unique ways of moving, different sensor placements, and body mechanics that deviate from the general training data.
To tackle this problem, a new hybrid framework has been developed that aims to bridge the gap between generalization and personalization. The core idea is a two-stage approach: first, a model is trained to generalize across many users, and then it rapidly adapts to individual users directly on their devices using a technique called few-shot learning. This innovative method achieves robust personalization by updating only a small, specific part of the model – the classifier layer – with user-specific data. This keeps computational and memory demands to a minimum, making it ideal for resource-constrained wearable devices.
The framework utilizes a lightweight 1D Convolutional Neural Network (1D-CNN) designed for multi-channel time-series data. This network is divided into two main components: a fixed feature extractor, which acts as the ‘backbone’ learning general patterns, and a trainable dense classifier, responsible for personalization. During on-device adaptation, only this classifier layer is updated, allowing for quick and efficient learning of user-specific variations without needing to retrain the entire model.
This system has been implemented on the energy-efficient RISC-V-based GAP9 microcontroller. The on-device learning engine is meticulously optimized to minimize memory access and computational overhead. The fixed backbone of the neural network remains in the device’s L2 memory, used only for making predictions. The trainable classifier layer, along with its optimization buffers, is stored in the faster L1 memory, enabling rapid updates. When new labeled data from a user becomes available, the GAP9 controller quickly computes gradients and updates the parameters in place. This multi-threaded update routine ensures that personalization is not only accurate but also incredibly fast.
The effectiveness and generality of this few-shot on-device personalization strategy were validated across three diverse HAR datasets: RecGym, QVAR-Gesture, and Ultrasound-Gesture. These datasets represent different sensing modalities and activity types, providing a comprehensive assessment. The evaluation protocol simulated real-world deployment by excluding one participant’s data from initial training and reserving it for testing and on-device adaptation. The results clearly demonstrated the impact of user-induced concept drift, with accuracy drops observed when generalizing to unseen users (e.g., a significant 25.89% drop for the QVAR dataset).
However, the proposed hybrid approach yielded consistent and substantial accuracy improvements post-deployment. RecGym saw an improvement of 3.73%, QVAR-Gesture a remarkable 17.38%, and Ultrasound-Gesture an increase of 3.70%. The QVAR dataset, known for its high user-specific signal characteristics, benefited the most. These findings confirm that even with a lightweight classifier adaptation, the model can significantly recover or even surpass its initial performance, even when initially trained on data from unrelated users.
Beyond accuracy, the system also demonstrated impressive efficiency. Inference for a single sample on the GAP9 takes approximately 0.34 milliseconds, while each on-device training update completes within 0.07–0.17 milliseconds. In terms of energy, each inference consumes about 35 microjoules, and a full parameter update requires roughly 4 microjoules per sample. This translates to over 250 times lower energy consumption for training updates compared to typical microcontroller-class devices, highlighting the GAP9’s parallel architecture and memory-optimized training kernel as key to this efficiency.
Also Read:
- Federated Learning Enhances Worker Action Recognition in Smart Manufacturing
- Unusual Events in Focus: A Deep Learning Perspective on Video Anomaly Detection
In conclusion, this research presents a groundbreaking hybrid framework for human activity recognition that successfully combines strong cross-user generalization with fast, on-device personalization. By leveraging few-shot learning on the energy-efficient GAP9 platform, the system rapidly adapts to new users, delivering significant accuracy gains with sub-millisecond latency and microjoule-level energy consumption. This work paves the way for scalable, user-aware HAR systems that can operate continuously across diverse users and changing environments, all while adhering to tight hardware budgets. You can find more details about this research at the research paper link.


