Geometry-Guided AI Enhances Multi-View Mammography Analysis

TLDR: The research introduces GLAM, a novel visual language model for mammography that uses geometry-guided local alignment to better understand multi-view breast images. Unlike previous models that often ignore the crucial relationship between different mammogram views, GLAM learns fine-grained cross-view correspondences by aligning patches from one view to slices in the other, mimicking how radiologists interpret images. Pre-trained on a large dataset, GLAM significantly outperforms existing methods in breast cancer detection, density prediction, and BI-RADS classification across various datasets, demonstrating improved accuracy and generalization by leveraging the inherent geometry of mammography.

Mammography screening is a vital tool for the early detection of breast cancer. Deep learning methods hold significant promise for improving the speed and accuracy of mammography interpretation. However, developing powerful visual language models (VLMs) for this domain faces challenges due to limited medical data and inherent differences between natural and medical images.

Existing mammography VLMs, often adapted from models designed for natural images, frequently overlook crucial domain-specific characteristics. A prime example is the multi-view nature of mammography. Standard protocols produce two 2D images of the same 3D breast from different angles: craniocaudal (CC) and mediolateral oblique (MLO). Radiologists meticulously analyze both views together to understand ipsilateral correspondence, which is essential for accurately locating regions of interest like tumors and mitigating ambiguities caused by projection angles. Current deep learning methods often treat these views as independent images or fail to properly model their multi-view correspondence, leading to a loss of critical geometric context and suboptimal predictions.

Introducing GLAM: Geometry-Guided Local Alignment for Multi-View Mammography

Researchers from Yale University have proposed a novel approach called GLAM: Global and Local Alignment for Multi-view mammography. This model is designed for visual language pre-training and leverages geometry guidance to address the shortcomings of previous methods. By incorporating prior knowledge about the multi-view imaging process of mammograms, GLAM learns local cross-view alignments and fine-grained local features through a combination of joint global and local, visual-visual, and visual-language contrastive learning.

The core idea behind GLAM is to mimic how radiologists interpret mammograms by considering the geometric relationship between the CC and MLO views. The model is pre-trained on EMBED, one of the largest open mammography datasets, and has demonstrated superior performance compared to existing baselines across multiple datasets and settings.

How GLAM Works

The GLAM model involves several key steps to achieve its robust performance:

Pre-processing: Before feeding images into the model, mammograms undergo specific pre-processing steps. This includes removing the pectoral region from MLO views and rotating images to better align the CC and MLO views along the anterior-posterior (AP) axis. Random affine transformations are also applied to make the model more resilient to minor misalignments. Additionally, radiology reports are synthesized from tabular data and augmented to create diverse textual supervision signals.

Global Multi-view Visual Language Pre-training: At a global level, GLAM extracts visual features from both CC and MLO views and textual features from the radiology report. It then optimizes a multi-view contrastive loss, ensuring that features from both views of the same breast are aligned. Symmetrically, it also aligns image features from each view with the corresponding text features, allowing the model to learn high-level semantic information from the reports.

Geometry-Guided Local Alignment: This is where GLAM truly innovates. Instead of just global alignment, the model performs local alignment using patch features. It aggregates these raw patch features into “super-patches” with larger receptive fields, capturing higher-level semantic information. The crucial part is the “patch-to-slice” alignment along the AP axis. Based on the known geometry of mammography, image slices from both views at the same AP position represent the same 3D breast tissue. Therefore, a patch in one view is aligned with an entire slice in the other view using a multi-head cross-attention mechanism. This ensures that the model learns fine-grained positional relationships and semantic correspondence across views, respecting the actual 3D breast structure.

To further enhance local positional awareness, GLAM uses negative samples not only from different positions within the same patient but also from the same position across different patients in the batch. This forces the model to focus on the actual patch features rather than just positional encoding.

Also Read:

Performance and Impact

GLAM was evaluated on three diverse datasets: EMBED (in-domain), VinDr, and RSNA-Mammo (out-of-domain). It consistently outperformed all baselines in various tasks, including BI-RADS prediction, density prediction, and cancer prediction, across zero-shot, linear probing, and full fine-tune settings. Notably, GLAM showed significant improvements in multi-view prediction tasks, demonstrating its ability to effectively model multi-view geometry and extract complementary features from each view.

The research highlights that ignoring either view in mammography can lead to diagnostic errors, especially in deep-learning models that lack prior knowledge of the imaging process. GLAM’s geometry-guided local alignment module provides this crucial fine-grained cross-view awareness, making it one of the largest and most robust screening mammography foundation CLIP models to date. For more technical details, you can refer to the full research paper here.

This work represents a significant step forward in developing more accurate and reliable AI tools for mammography interpretation, potentially leading to earlier and more precise breast cancer detection.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Geometry-Guided AI Enhances Multi-View Mammography Analysis

Introducing GLAM: Geometry-Guided Local Alignment for Multi-View Mammography

How GLAM Works

Performance and Impact

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates