TLDR: A new deep learning model called CABNet, incorporating Global Attention Block (GAB) and Category Attention Block (CAB), significantly improves the accuracy of Diabetic Retinopathy (DR) classification, especially on imbalanced datasets. It uses pre-trained networks like DenseNet-169 and MobileNetV3-small as backbones, achieving high accuracy with fewer parameters, making it efficient for clinical use and early detection of DR. The model also offers interpretability through visualizations, highlighting its potential for real-world deployment.
Diabetic Retinopathy (DR) is a serious complication of diabetes that affects the eyes, potentially leading to vision loss. With over 537 million people globally living with diabetes, and this number projected to rise significantly, early and accurate detection of DR is crucial for preventing permanent damage and guiding personalized treatment plans. However, a major challenge in developing automated DR classification systems using deep learning has been the imbalanced distribution of data in available datasets, where some stages of the disease are far less represented than others.
A recent research paper, titled “Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning,” introduces a novel approach to overcome this challenge. The study proposes a deep learning model that integrates a dual attention mechanism, combining a Global Attention Block (GAB) and a Category Attention Block (CAB), into existing powerful neural networks.
Addressing Data Imbalance with Dual Attention
The core innovation of this research lies in its dual attention mechanism. The Global Attention Block (GAB) is designed to capture broad, contextual features across the entire retinal image, focusing on both channel-wise and spatial information. This helps the model understand the overall structure and potential areas of interest. Following the GAB, the Category Attention Block (CAB) refines these features by emphasizing category-specific information. This is particularly effective in handling imbalanced datasets because the CAB allocates specific feature channels to each DR category, ensuring that even less common disease stages receive adequate attention and are not overlooked due to a lack of samples.
The researchers evaluated their proposed method, referred to as CABNet, by integrating these attention blocks into three well-known pre-trained deep learning architectures: MobileNetV3-small, EfficientNet-b0, and DenseNet-169. These networks serve as the backbone for the classification system, which categorizes retinal fundoscopy images into five distinct phases of DR: No DR, Minor DR, Moderate DR, Severe DR, and Proliferative DR.
Impressive Performance on Public Datasets
The model was rigorously tested on two widely recognized public datasets of retinal fundoscopy images: APTOS and EYEPACS. On the APTOS dataset, the DenseNet-169 backbone achieved a mean accuracy of 83.20%, with MobileNetV3-small and EfficientNet-b0 yielding 82% and 80% accuracies, respectively. For the EYEPACS dataset, EfficientNet-b0 performed best with an 80% mean accuracy, while DenseNet-169 and MobileNetV3-small achieved 75.43% and 76.68% accuracies. Beyond accuracy, the model also demonstrated strong performance across other critical metrics, including an F1-score of 82.0%, precision of 82.1%, sensitivity of 83.0%, specificity of 95.5%, and a kappa score of 88.2% on the APTOS dataset.
A significant advantage of this proposed approach is its efficiency. The MobileNetV3-small model, for instance, requires a remarkably low number of parameters (1.6 million on APTOS and 0.90 million on EYEPACS), making it computationally lightweight. This is crucial for practical deployment, especially in real-time screening systems or on devices with limited computational resources.
Also Read:
- Navigating Model Multiplicity in Medical AI: Ensuring Consistent Diagnoses
- SIDE: Making AI Decisions Transparent with Sparse Explanations
Enhanced Interpretability for Clinical Use
To build trust and facilitate clinical adoption, the researchers also incorporated Grad-CAM visualizations. These visualizations generate heatmaps that highlight the specific regions in the retinal image that the model focuses on when making a prediction. This allows ophthalmologists to visually cross-reference the model’s decisions with known pathological features like microaneurysms and hemorrhages, enhancing the transparency and reliability of the diagnostic process. The visualizations showed that the dual attention mechanism effectively guides the model to focus on relevant lesion regions, even subtle ones, which is vital for early-stage detection.
The lightweight nature and high accuracy of this model make it highly suitable for real-world clinical applications, particularly in underserved areas where access to specialized diagnostic tools is limited. It can serve as a valuable decision-support tool, reducing diagnostic workload and enabling timely intervention to prevent irreversible vision loss. For more detailed information, you can read the full research paper here.
Future work aims to further enhance the model’s robustness by utilizing neural diffusion models to generate high-quality synthetic retinal fundus images for training, which could lead to even more generalized and reliable DR grading across diverse clinical scenarios.


