TLDR: TextSAM-EUS is a new, lightweight AI model that adapts the Segment Anything Model (SAM) to accurately segment pancreatic tumors in endoscopic ultrasound (EUS) images. Unlike previous methods that require manual input, TextSAM-EUS uses text prompts and efficient fine-tuning (LoRA) to automate the segmentation process. It significantly outperforms existing models, offering a practical and robust solution for medical image analysis, especially in challenging, noisy ultrasound environments.
Pancreatic cancer is a highly aggressive disease with a low survival rate, making early and accurate diagnosis crucial. Endoscopic ultrasound (EUS) is a vital tool for diagnosing and managing pancreatic cancer, allowing for targeted biopsies and therapies. However, EUS images are often challenging to interpret due to speckle noise, low contrast, and the subtle appearance of tumors. This makes it difficult for traditional deep learning models to accurately outline tumors, as they typically require extensive, expert-annotated datasets.
The Segment Anything Model (SAM), a powerful AI foundation model, has shown great promise in image segmentation. However, its original design relies on manual ‘geometric prompts’ like points or bounding boxes, which can be time-consuming and require specialized medical expertise. Furthermore, SAM was initially trained on natural images, leading to a ‘domain shift’ when applied to medical images, especially noisy ultrasound scans.
To overcome these challenges, researchers have introduced TextSAM-EUS, a novel and lightweight adaptation of SAM specifically designed for segmenting pancreatic tumors in EUS images. This innovative approach eliminates the need for manual geometric prompts during inference, making the process more efficient and user-friendly.
How TextSAM-EUS Works
TextSAM-EUS leverages ‘text prompt learning,’ also known as context optimization. It uses a specialized component called the BiomedCLIP text encoder to understand natural language descriptions, such as “pancreatic tumor.” This text-based guidance is then integrated with SAM’s architecture. To make the adaptation highly efficient, TextSAM-EUS employs a technique called Low-Rank Adaptation (LoRA), which allows the model to be fine-tuned by adjusting only a tiny fraction (0.86%) of its total parameters.
The framework also includes an iterative segmentation refinement step. After an initial prediction based on text prompts, the model automatically extracts geometric cues (like the bounding box and center point of the predicted tumor) and uses them to further refine the segmentation, enhancing accuracy with minimal computational cost.
Impressive Performance
TextSAM-EUS was rigorously evaluated on the public Endoscopic Ultrasound Database of the Pancreas, a dataset containing EUS images with expert-labeled pancreatic tumor regions. The model demonstrated superior performance compared to existing state-of-the-art supervised deep learning models and other foundation models, including various SAM adaptations.
In fully automatic, text-driven segmentation, TextSAM-EUS achieved a Dice Similarity Coefficient (DSC) of 82.69% and a Normalized Surface Distance (NSD) of 85.28%. These metrics indicate high accuracy in outlining the tumor boundaries. Notably, TextSAM-EUS outperformed other automatic SAM variants while tuning significantly fewer parameters, highlighting its efficiency.
Ablation studies, which examine the contribution of individual components, confirmed the effectiveness of TextSAM-EUS’s design choices. These studies showed that a moderate LoRA adaptation, a concise text prompt, deep integration of the prompts, and the combination of automatically derived bounding box and centroid for refinement all contribute to its strong performance.
Also Read:
- ScSAM: A New AI Model for Precise Subcellular Segmentation in Microscopy
- Diff-UMamba: Enhancing Tumor Segmentation with Noise Reduction in Limited Data Settings
Looking Ahead
The development of TextSAM-EUS marks a significant step forward in medical image segmentation. It demonstrates that linguistic context can guide segmentation as effectively as manual geometric prompts, reducing the reliance on specialized radiological knowledge. The model’s low trainable parameter count also suggests its potential for use in clinical settings with limited computational resources.
The researchers plan to extend this framework to multi-class segmentation and evaluate its applicability to other medical imaging modalities and conditions. This work opens new avenues for leveraging language-driven prompting in biomedical applications of powerful foundation models. For more details, you can refer to the full research paper.


