TLDR: SAM-BG is a novel two-stage self-supervised learning framework that significantly improves psychiatric diagnosis from fMRI brain graphs, particularly when labeled data is scarce. It achieves this by training an ‘edge masker’ on a small labeled dataset to identify crucial structural brain connectivity patterns. These patterns then guide a ‘structure-aware’ data augmentation process on unlabeled data, preserving important biological semantics and leading to more robust, accurate, and interpretable diagnostic models for disorders like Autism Spectrum Disorder (ASD) and Attention-Deficit/Hyperactivity Disorder (ADHD).
Diagnosing psychiatric disorders accurately and efficiently is a significant challenge, often hampered by the limited availability of labeled brain network data. Traditional methods for analyzing functional magnetic resonance imaging (fMRI) data, which captures brain activity, frequently overlook the brain’s intricate topological structure. While advanced techniques like Graph Neural Networks (GNNs) show promise in identifying complex connectivity patterns, their effectiveness typically relies on vast amounts of labeled data, which is scarce in psychiatric research due to privacy concerns and high annotation costs.
Self-supervised learning (SSL) has emerged as a potential solution, allowing models to learn from unlabeled data. However, existing SSL methods often employ generic data augmentation strategies that can inadvertently distort crucial structural semantics—biologically meaningful connectivity patterns—within brain graphs. This distortion can lead to less realistic and less interpretable representations, undermining the utility of these methods in clinical applications.
Introducing SAM-BG: A Structure-Aware Approach
To overcome these limitations, researchers have developed SAM-BG (Structure Matters: Brain Graph Augmentation via Learnable Edge Masking), a novel two-stage framework designed to learn robust and biologically meaningful brain graph representations, particularly in settings with limited labeled data. SAM-BG focuses on preserving essential structural semantics during the learning process.
How SAM-BG Works
The SAM-BG framework operates in two distinct phases:
The first phase, Structural Semantic Extraction, involves training an ‘edge masker’ using a small subset of labeled data. This masker’s role is to identify and capture the most discriminative substructures—key functional connections—within the brain network. It does this by retaining label-relevant information while discarding redundant topological noise, guided by a principle known as the Information Bottleneck.
In the second phase, SSL with Structure-semantic Preservation, the pre-trained edge masker is applied to large-scale unlabeled brain graphs. Instead of randomly perturbing the entire graph, SAM-BG introduces controlled changes only to the non-essential parts of the graph, while meticulously preserving the previously identified salient substructures. This ‘structure-aware’ augmentation generates diverse yet semantically consistent views of the brain graphs, enabling the model to learn more meaningful and robust representations.
The learning objective in this phase uses Canonical Correlation Analysis (CCA) to align the representations generated from these augmented views. This alignment ensures that the learned representations are invariant to minor perturbations while preventing the model from collapsing into trivial solutions.
Also Read:
- ZADS: Adapting Diffusion Models for Enhanced MRI Reconstruction Without Retraining
- Enhancing Medical Image Alignment Through Decoupled Training Supervision
Experimental Validation and Interpretability
The effectiveness of SAM-BG was rigorously tested on two real-world psychiatric datasets: ABIDE (for Autism Spectrum Disorder, ASD) and ADHD-200 (for Attention-Deficit/Hyperactivity Disorder, ADHD). The results demonstrated that SAM-BG consistently outperformed state-of-the-art supervised and other self-supervised methods, especially in scenarios with limited labeled data. For instance, with only 20% labeled data, SAM-BG achieved significant improvements in accuracy, AUC, and F1-score compared to the best baselines.
An ablation study further confirmed the critical role of the extracted substructures, showing that removing them severely degraded performance, while using only these substructures yielded results close to the full model. This highlights that the identified substructures indeed encode the most discriminative and label-relevant semantic features of the brain network.
Beyond its superior performance, SAM-BG also offers enhanced interpretability. The edge masker can uncover clinically relevant connectivity patterns associated with psychiatric disorders. For example, in ASD patients, the model identified increased internal connectivity within the visual network and enhanced connections between the limbic network and other networks, suggesting heightened sensory sensitivity and emotional over-processing. In ADHD patients, it highlighted hyperconnectivity in the ventral attention network and elevated internal connectivity within the cerebellum, consistent with their roles in attention control.
These findings underscore SAM-BG’s potential to provide valuable diagnostic insights and advance our understanding of the neural underpinnings of psychiatric disorders. For more details, you can read the full research paper here.


