spot_img
HomeResearch & DevelopmentAdvancing Glaucoma Diagnosis with AI-Powered OCT Reporting

Advancing Glaucoma Diagnosis with AI-Powered OCT Reporting

TLDR: A fine-tuned multimodal large language model (Llama 3.2 Vision-Instruct) has been developed to accurately detect glaucoma from OCT scans, assess image quality, and generate structured clinical reports detailing retinal nerve fiber layer (RNFL) thinning. The model achieved high accuracy in quality triage (0.90) and glaucoma detection (0.86), with strong alignment in generated text reports, showing potential to improve diagnostic confidence and reduce clinician documentation burden.

Glaucoma, a progressive eye disease, stands as a leading cause of irreversible blindness globally. Early detection, particularly of retinal nerve fiber layer (RNFL) thinning, is paramount for preserving vision. Optical Coherence Tomography (OCT) is a crucial imaging tool for this, providing detailed measurements of structural damage often before vision loss is noticeable. However, interpreting OCT scans can be complex, especially with subtle thinning patterns or poor image quality, and the process of documenting findings adds a significant burden to clinicians.

Addressing these challenges, researchers have developed an innovative approach using a fine-tuned multimodal large language model (MM-LLM) to assist in glaucoma detection and streamline OCT interpretation. This new model aims to not only screen optic nerve head (ONH) OCT circle scans for quality but also to generate structured clinical reports that include a glaucoma diagnosis and detailed assessments of RNFL thinning across different sectors of the eye.

The AI Solution: A Fine-tuned Multimodal Language Model

The study utilized the Llama 3.2 Vision-Instruct model, an advanced MM-LLM capable of processing both text and image inputs. This model was specifically fine-tuned using a large dataset of ONH OCT images paired with automatically generated, structured clinical descriptions. These descriptions detailed global and sectoral RNFL thinning and included an image quality flag. Scans deemed of poor quality were labeled as unusable and paired with a fixed refusal statement, preventing the model from generating potentially misleading information from unreliable inputs.

The model’s performance was rigorously evaluated on a separate test set across three key tasks: assessing image quality, detecting glaucoma, and classifying RNFL thinning in seven anatomical sectors (global, temporal, temporal superior, temporal inferior, nasal, nasal superior, nasal inferior). The quality of the generated clinical descriptions was also measured using standard text evaluation metrics like BLEU, ROUGE, METEOR, and BERTScore, which assess various aspects of text similarity and semantic accuracy.

Key Findings and Performance

The results demonstrated the model’s strong capabilities. For image quality assessment, it achieved an accuracy of 0.90 and a high specificity of 0.98, effectively identifying unusable scans. In glaucoma detection, the model showed an accuracy of 0.86 and an F1-score of 0.91, indicating reliable diagnostic performance.

When predicting RNFL thinning, the model’s accuracy ranged from 0.83 to 0.94, performing particularly well in the global and temporal sectors, including the temporal superior and inferior regions. These are areas commonly affected by glaucoma, suggesting the model effectively learned prevalent thinning patterns. The text generation scores were also impressive, with a BLEU score of 0.82 and a BERTScore-F1 of 0.99, indicating a strong alignment between the model-generated reports and reference clinical descriptions.

A detailed analysis by glaucoma severity revealed that the model was highly accurate in detecting pronounced thinning in moderate-to-advanced glaucoma cases, especially in the temporal sectors. However, its performance in the nasal regions was better for mild cases, highlighting the need for more balanced training data to enhance sensitivity to early-stage changes in less affected areas.

Also Read:

Implications for Clinical Practice

This fine-tuned MM-LLM represents a significant advancement towards integrating AI into real-world clinical diagnostics. By generating structured, human-like clinical reports, the model not only offers high diagnostic accuracy but also provides explanations that align with clinical reasoning. This ‘reasoning-based interpretability’ can boost clinician confidence and improve patient care. The automated reports could also serve as drafts, potentially reducing the significant documentation burden faced by ophthalmologists due to high patient volumes.

The integrated image quality triage mechanism is a crucial safety feature, preventing the model from producing speculative or erroneous interpretations from poor-quality scans. This ensures that generated reports are based on diagnostically valid data, fostering transparency and trust in AI systems.

While promising, further validation across additional datasets and the integration of diverse and balanced training data are essential for broader clinical adoption. Future research could also explore incorporating other modalities like fundus photographs and visual field tests to further enhance diagnostic accuracy and support long-term disease monitoring. For more details, you can refer to the full research paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -