TLDR: This paper introduces a new method to improve synthetic CT (sCT) generation from Cone-Beam CT (CBCT) by integrating a learnable registration module (Spatial Transformer Network – STN) directly into the sCT pipeline. This end-to-end approach addresses the common issue of misalignment between intraoperative CBCT and preoperative CT scans, leading to enhanced sCT quality, particularly in cases of low CBCT quality and moderate misalignment.
Cone-Beam Computed Tomography (CBCT) is a valuable tool in medical procedures, especially for real-time imaging during surgery, thanks to its quick acquisition and lower radiation dose. However, CBCT images often have artifacts and aren’t as clear as traditional Computed Tomography (CT) scans. This can make it challenging for doctors to get the best possible view.
A promising solution to this problem is the creation of synthetic CT (sCT) images. This involves converting CBCT volumes into the higher-quality CT domain, effectively reducing artifacts and improving image clarity. While sCT generation holds great potential, it faces a significant hurdle: the inherent misalignment between intraoperative CBCT scans (taken during a procedure) and preoperative CT scans (taken beforehand for planning).
Researchers have been exploring ways to combine information from both CBCT and CT scans, a technique known as multimodal learning, to enhance sCT quality. However, simply fusing these images doesn’t fully address the misalignment issue, which can arise from patient movement or changes in anatomical conditions between scans.
A new study introduces an innovative approach to overcome this challenge by integrating an end-to-end learnable registration module directly into the sCT generation process. This module, based on Spatial Transformer Networks (STN), allows the system to automatically align the preoperative CT data with the intraoperative CBCT data during the image synthesis. This means the model can jointly optimize both the alignment and the sCT generation tasks, leading to more accurate and higher-quality synthetic images.
The effectiveness of this new method was rigorously evaluated using both a controlled synthetic dataset and two real-world clinical datasets. The results showed significant improvements in sCT quality, outperforming existing multimodal methods in a large majority of evaluation settings. The benefits were particularly noticeable when the initial CBCT image quality was low and the preoperative CT was moderately misaligned. While the method showed strong generalizability across datasets, its performance did see a slight decline with extreme levels of misalignment, suggesting that very large spatial transformations might still require some initial external alignment.
Also Read:
- LangMamba: Enhancing Low-Dose CT Denoising with Vision-Language Models
- MML-SurgAdapt: A Unified AI Framework for Multi-Task Surgical Vision with Reduced Labeling
This research highlights the potential of integrating learnable registration into multimodal sCT generation. By enabling the system to adaptively align different imaging modalities, it paves the way for more robust and reliable synthetic CT images, ultimately improving intraoperative imaging workflows and supporting better guidance during medical procedures. For more details, you can refer to the full research paper here.


