SToFM: A New Model for Understanding Spatial Transcriptomics Data Across Multiple Biological Scales

TLDR: SToFM is a novel foundation model designed for Spatial Transcriptomics (ST) data analysis. It addresses the challenge of integrating multi-scale biological information (macro-scale tissue morphology, micro-scale cellular interactions, and gene-scale expression profiles) by constructing multi-scale sub-slices and using an SE(2) Transformer. Trained on SToCorpus-88M, the largest ST dataset to date, SToFM demonstrates superior performance across various downstream tasks including tissue segmentation, cell type annotation, clustering, deconvolution, and imputation, showcasing its comprehensive understanding and transferability of ST data.

Spatial Transcriptomics (ST) technologies are revolutionizing how biologists understand single-cell biology by allowing them to study gene expression while keeping the cells in their original spatial context within tissues. This provides a much richer picture than traditional methods that dissociate cells, losing crucial information about their environment and interactions.

However, analyzing ST data presents a significant challenge: it requires extracting information at multiple scales simultaneously. Imagine trying to understand a city by only looking at individual houses, or only at the entire city map, but not both. ST data similarly contains macro-scale tissue morphology (the overall shape and structure of an organ), micro-scale cellular microenvironments (how cells interact with their immediate neighbors), and gene-scale gene expression profiles (the specific genes active within each cell). Integrating these diverse levels of information is complex.

Introducing SToFM: A Multi-scale Foundation Model

To address this challenge, researchers have developed SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM is designed to comprehensively understand ST data by capturing and integrating information from all three crucial scales: macro, micro, and gene.

The model works by first performing a multi-scale information extraction process on each ST tissue slice. This involves creating ‘ST sub-slices’ that cleverly combine information from all three scales. For the gene scale, SToFM uses a pre-trained cell encoder, which is further adapted to ST data to ensure high-quality gene expression representations. For the micro scale, the ST slice is divided into smaller sub-slices, focusing on localized cell-cell interactions. To maintain macro-scale information, SToFM identifies ‘virtual cells’ by clustering all cells in the slice. These virtual cells act as a compressed representation of the tissue’s overall structure and are then incorporated into each sub-slice, allowing the model to perceive the larger organizational patterns while still focusing on local details.

Once these multi-scale sub-slices are constructed, SToFM employs an SE(2) Transformer. This specialized neural network is designed to process both the gene expression information and the spatial coordinates of the cells, producing high-quality cell representations that are robust to common spatial transformations like rotations and translations. The model is trained using two main objectives: Masked Cell Modeling (MCM), where it predicts masked gene expression embeddings, and Pairwise Distance Recovery (PDR), where it reconstructs original spatial distances after some coordinates are perturbed. These tasks help SToFM learn both the genetic and spatial characteristics of the data.

A Massive Training Corpus: SToCorpus-88M

A key component of SToFM’s development is SToCorpus-88M, the largest high-resolution spatial transcriptomics corpus ever constructed for pretraining. This massive dataset comprises approximately 2,000 high-resolution ST slices, totaling an astounding 88 million cells. It includes data from six different ST technologies and covers both human and mouse samples, significantly surpassing previous datasets in both scale and diversity. This extensive corpus is crucial for training a robust foundation model like SToFM, enabling it to learn generalizable patterns across various biological contexts.

Demonstrated Performance Across Diverse Tasks

SToFM has shown exceptional performance across a variety of important downstream biological tasks, highlighting its comprehensive understanding of ST data:

Tissue Region Semantic Segmentation: SToFM significantly outperforms existing methods in identifying structural and functional regions within tissues, such as human embryonic structures and layers of the dorsolateral prefrontal cortex (DLPFC). Notably, its performance is particularly strong in cross-slice settings, demonstrating excellent robustness and transferability.
Cell Type Annotation: The model achieves superior accuracy in identifying different cell types within spatial transcriptomics data, even with lower-quality gene expression profiles often found in ST data. This suggests that incorporating spatial information helps in inferring cell types.
Zero-shot Clustering and Visualization: SToFM produces high-quality cell embeddings that allow for clear and distinct clustering of cell types, even without prior training on specific labels. Visualizations show that cells of the same type form tight clusters, and the representations also reflect biological relationships between different cell types.
Spatial Deconvolution: The model effectively predicts the proportion of various cell types within a given spot in ST data, showcasing its ability to transfer deconvolution results between labeled and unlabeled slices.
Spatial Transcriptomics Imputation: SToFM demonstrates strong capabilities in inferring uncaptured gene expression levels, a critical task for analyzing ST data where gene coverage can be limited.

An ablation study further confirmed that each multi-scale component (gene, micro, and macro) contributes to SToFM’s improved performance and transferability, validating the design choices.

Also Read:

Future Directions

While SToFM represents a significant leap forward, the researchers acknowledge its current limitations. The model currently focuses on three main scales, and future work could explore integrating even more scales, perhaps using techniques like image pyramids. Additionally, incorporating causal machine learning methods to model gene regulatory relationships or integrating other biological knowledge and modalities like pathological images could further enhance the model’s capabilities.

SToFM is a powerful new tool that promises to accelerate discoveries in single-cell biology and tissue research by providing a more complete and integrated understanding of spatial transcriptomics data. You can find more details about this research in the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

SToFM: A New Model for Understanding Spatial Transcriptomics Data Across Multiple Biological Scales

Introducing SToFM: A Multi-scale Foundation Model

A Massive Training Corpus: SToCorpus-88M

Demonstrated Performance Across Diverse Tasks

Future Directions

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates