RadOnc-GPT: An AI Agent for Automated Patient Outcome Labeling in Radiation Oncology

TLDR: RadOnc-GPT is an autonomous AI agent that uses large language models to retrieve and interpret patient data, accurately label complex clinical outcomes like cancer recurrence and osteoradionecrosis, and identify errors in existing medical records. It significantly improves the scalability, accuracy, and timeliness of patient outcomes research in radiation oncology by acting as both a labeler and an auditor of clinical data.

In the evolving landscape of healthcare, the ability to accurately and efficiently track patient outcomes is paramount, especially in specialized fields like radiation oncology. Traditionally, this process has been heavily reliant on manual labeling, a method that often struggles with scale, accuracy, and timeliness. A new research paper introduces RadOnc-GPT, an innovative autonomous large language model (LLM)–based agent designed to overcome these limitations by independently retrieving patient-specific information, iteratively assessing evidence, and returning structured outcomes in real-time.

The paper, titled “RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale,” was authored by Jason Holmes, Yuexing Hao, Mariana Borras-Osorio, Federico Mastroleo, Santiago Romero Brufau, Valentina Carducci, Katie M Van Abel, David M Routman, Andrew Y. K. Foong, Liv M Muller, Satomi Shiraishi, Daniel K Ebner, Daniel J Ma, Sameer R Keole, Samir H Patel, Mirek Fatyga, Martin Bues, Brad J Stish, Yolanda I Garces, Michelle A Neben Wittich, Robert L Foote, Sujay A Vora, Nadia N Laack, Mark R Waddle, and Wei Liu. Their work highlights a significant step forward in leveraging AI for clinical data management.

RadOnc-GPT is not just another chatbot; it’s an autonomous agent capable of conducting multi-turn conversations and making independent decisions on which functions to call and when to stop. Its architecture integrates both internal data resources, such as Mayo Clinic’s radiation oncology database, Aria (Varian Medical Systems), and enterprise electronic health record (EHR) systems like Epic, with external public data sources including PubMed, ClinicalTrials.gov, and the National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) via their public APIs.

A key distinction of RadOnc-GPT’s design is its departure from conventional retrieval-augmented generation (RAG). Instead of relying on vector similarity for poorly organized data, it leverages the systematically organized and indexed nature of patient data within Epic. This allows for targeted, well-structured data retrieval through a large set of highly specific functions, ensuring that the model receives relevant information without being overwhelmed.

The evaluation of RadOnc-GPT was conducted through a rigorous two-tier strategy. The first tier, a structured quality assurance (QA) task, assessed the agent’s ability to accurately retrieve demographic and radiotherapy treatment plan details. This foundational step established trust in its structured-data retrieval capabilities. RadOnc-GPT achieved remarkable accuracy, matching all six demographic fields for 500 patients (100%) and accurately reproducing radiation-course counts in 497 out of 500 cases (99.4%).

The second tier involved more complex clinical outcomes labeling. Here, RadOnc-GPT autonomously combined structured EHR data with unstructured clinical notes, radiology, and pathology reports to determine outcomes such as mandibular osteoradionecrosis (ORN) in head-and-neck cancer patients and cancer recurrence in independent prostate and head-and-neck cancer cohorts. Ground-truth labels, initially generated by expert radiation oncologists, were used for comparison. Crucially, discrepancies between RadOnc-GPT’s outputs and these ground-truth labels underwent independent adjudication by other radiation oncologists.

The results from the complex clinical outcomes labeling were particularly insightful. For ORN determination (233 patients), accuracy rose from 84.5% to 95.2% post-adjudication. Prostate cancer recurrence detection (80 patients) improved from 92.5% to 95.0%, and head-and-neck recurrence detection (82 patients) improved from 92.7% to 96.3%. A significant finding was that among 48 initial discrepancies across these tasks, adjudication revealed 30 (63%) to be previously unrecognized ground-truth errors, highlighting RadOnc-GPT’s dual capacity as both a labeler and an auditor of existing data.

The study concludes that RadOnc-GPT reliably retrieves foundational structured data and effectively generalizes complex clinical outcome labeling tasks, notably using a single cancer recurrence detection prompt across multiple disease sites. Its high recall performance minimizes clinically critical false negatives, and its ability to identify latent errors significantly enhances registry data integrity. This autonomous LLM agent promises to enable scalable, trustworthy, and real-time curation of radiation-oncology research datasets, allowing clinicians to focus on judgment rather than data wrangling.

Also Read:

For more detailed information, you can read the full research paper here: RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

RadOnc-GPT: An AI Agent for Automated Patient Outcome Labeling in Radiation Oncology

Gen AI News and Updates

Jorie AI Unveils SmartCore Engine: Revolutionizing Healthcare Intelligence and Automation

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates