TLDR: CatEquiv is a novel neural network designed for Human Activity Recognition (HAR) from inertial sensors. It systematically encodes temporal, amplitude, and structural symmetries (like time shifts, sensor gains, and sensor hierarchy) into its architecture. This ‘category-equivariant’ design allows CatEquiv to achieve significantly higher robustness and generalization on out-of-distribution data compared to standard CNNs, without increasing model complexity, demonstrating the power of built-in categorical inductive bias.
Human Activity Recognition (HAR) is a field focused on identifying human movements and actions using data, often from sensors embedded in smartphones. While HAR systems are becoming increasingly common, they face a significant challenge: variability in how data is collected. Imagine a smartphone user performing an activity like walking. The way they hold the phone (orientation), the exact moment they start recording (temporal shift), or even slight drifts in sensor calibration (amplitude scaling) can all introduce variations that make it difficult for standard recognition systems to perform consistently.
Traditional neural networks, like Convolutional Neural Networks (CNNs), often learn specific patterns tied to how data was presented during training. This means they perform well when the test data closely matches the training data, but their performance drops sharply when these factors change – a common problem known as ‘out-of-distribution’ (OOD) performance degradation.
A new research paper, titled “Learning with Category-Equivariant Architectures for Human Activity Recognition,” introduces an innovative solution called CatEquiv. This novel neural network architecture is designed to systematically encode various symmetries inherent in HAR data, leading to much greater robustness against these real-world variations.
The CatEquiv Approach: Embracing Symmetries
The core idea behind CatEquiv is ‘category-equivariant learning.’ Instead of trying to learn every possible variation through brute-force data augmentation, CatEquiv builds these symmetries directly into its architectural design. The researchers formalize these symmetries using a mathematical concept called a ‘categorical symmetry product’ (C3). This product combines three key types of variability:
- Cyclic Time Shifts: Accounting for when an activity window begins.
- Positive Gains: Handling changes in sensor sensitivity or amplitude.
- Sensor-Hierarchy Poset: Recognizing the inherent structure of sensors (e.g., individual axes feeding into a sensor, which then feeds into a total signal).
CatEquiv is engineered to be ‘equivariant’ to this categorical symmetry product. In simpler terms, if the input data undergoes one of these transformations (like a time shift or a gain change), the network’s internal representation transforms in a predictable and consistent way. This built-in understanding of symmetries allows the network to generalize better to unseen variations.
How CatEquiv Works
The architecture of CatEquiv incorporates several clever design choices to achieve this equivariance:
- Time-Shift Equivariance: It uses circular 1D convolutions and global time pooling, which inherently handle cyclic time shifts.
- Gain Invariance: Per-sensor RMS normalization and log-RMS side channels are used to make the network robust to changes in signal amplitude.
- Rotation Invariance: Axis-shared temporal filters followed by L2 pooling across axes help the network become invariant to device orientation changes (3D rotations).
- Poset Consistency: Sensor-shared filters and averaging ensure that the hierarchical relationships between sensors are maintained throughout the processing.
These architectural constraints ensure that the network’s linear core commutes with the various transformations, meaning it processes the data consistently regardless of these natural variations.
Impressive Results on UCI-HAR
The researchers tested CatEquiv on the widely used UCI-HAR dataset, applying composite out-of-distribution (OOD) perturbations that included cyclic time shifts, random 3D rotations, and per-sensor gain changes. CatEquiv was compared against two baselines: PlainCNN (a standard CNN with zero padding) and CircCNN (a CNN with circular padding, offering time-shift equivariance).
The results were striking. Under these challenging OOD conditions, CatEquiv achieved substantially higher accuracy and macro-F1 scores. For instance, it reached an F1 score of 0.73, compared to 0.42 for CircCNN and a mere 0.12 for PlainCNN. This demonstrates that enforcing categorical symmetries leads to strong invariance and generalization without needing to increase the model’s complexity or capacity.
Ablation studies further confirmed the importance of each component, showing that time-shift equivariance, rotational handling, and sensor poset consistency contributed the largest gains in robustness.
Also Read:
- Understanding Human Movement: A New Approach to Pose Similarity and Action Quality Assessment
- HIT-ROCKET: A New Era for Efficient Time Series Classification on Edge Devices
Broader Impact
The implications of CatEquiv extend beyond just Human Activity Recognition. The framework is general and can be applied to many other domains where data exhibits similar ‘categorical symmetry structures’ – combinations of group actions (like time, scale, rigid motion) and hierarchical or relational structures (like sensor stacks or feature hierarchies).
This could include multichannel biomedical and geophysical time series, multi-sensor robotics stacks, molecular and 3D vision tasks, and multimodal data fusion. By identifying the task’s specific symmetry category and designing the network’s linear core as a ‘natural transformation,’ researchers can build robust models that generalize well under real-world shifts without increasing model size.
The CatEquiv paper can be accessed here: Research Paper.


