Automating Radiotherapy Planning with Zero-Shot Large Language Model Agents

TLDR: This research introduces a novel method for fully automating radiotherapy treatment planning using a large language model (LLM) agent in a zero-shot setting. The LLM agent interacts directly with a clinical treatment planning system, iteratively adjusting optimization parameters based on real-time feedback and clinical objectives. Tested on head-and-neck cancer cases, the LLM-generated plans achieved comparable organ sparing and improved target conformity and hot spot control compared to manual plans, demonstrating a significant step towards generalizable and efficient AI-driven planning without the need for prior training data.

Radiotherapy is a crucial treatment for many cancer patients, with millions receiving it globally each year. However, the process of creating a treatment plan is highly complex, requiring specialized expertise and many iterative adjustments. This manual approach is becoming increasingly unsustainable due to the rising number of cancer cases and existing workforce shortages, leading to calls for greater automation.

Current automated planning methods, such as knowledge-based planning, protocol-based planning, multi-criteria optimization, and reinforcement learning, each offer benefits but also come with limitations. These often include the need for large, high-quality datasets, a lack of flexibility for unusual anatomies, significant human engagement, or intensive computational requirements. As a result, a universally applicable automated solution has remained elusive.

A recent study introduces a groundbreaking approach that leverages large language model (LLM) agents for fully automated radiotherapy treatment planning, operating in a “zero-shot” setting. This means the LLM agent performs the task without any prior exposure to manually generated treatment plans, fine-tuning, or specific task training. This capability is particularly valuable in specialized fields like radiation therapy, where extensive expert-labeled data is scarce.

The proposed workflow involves an LLM agent directly interacting with a commercial clinical treatment planning system (TPS), specifically Eclipse™ by Varian Medical Systems. The agent iteratively extracts information about the plan’s current state, such as dose-volume histograms (DVHs) and objective function losses, and then proposes new constraint values to guide the inverse optimization process. Its decision-making is informed by current observations, previous optimization attempts, and evaluations, allowing it to dynamically refine its strategy.

To enable the LLM to perform effectively, the complex planning task was broken down into simpler, domain-agnostic subtasks. The agent was equipped with an arithmetic tool to quantify deviations from clinical goals and was provided with historical data to facilitate trend-based reasoning. Crucially, domain-specific information about the optimization system, including how constraints influence dose distribution, was encoded into the prompt. The use of chain-of-thought reasoning further enhanced the agent’s ability to make multi-step decisions, similar to a human planner, by explicitly articulating its thought process before proposing adjustments.

The feasibility of this LLM-driven workflow was tested on twenty head-and-neck cancer cases. The LLM-generated plans were compared against clinical manual plans, with key dosimetric endpoints analyzed. The study utilized two state-of-the-art LLMs, GPT-4.1 and GPT-4.1-mini, both with and without access to optimization priors (domain-specific knowledge about constraint ranges and their effects).

The results were highly promising. Plans generated by GPT-4.1 with optimization priors (GPT-4.1-WP) achieved clinically comparable quality to manual plans. They demonstrated similar organ-at-risk (OAR) sparing, while showing improved hot spot control and superior conformity for the planning target volumes (PTVs). For instance, the maximum dose (Dmax) was 106.5% for LLM plans versus 108.8% for clinical plans, and the conformity index for the boost PTV was 1.18 versus 1.39. The study highlighted that access to optimization priors was critical; without them, the LLM’s performance significantly deteriorated, leading to worse OAR sparing.

A case study illustrated the agent’s reasoning process. The LLM initialized optimization constraints close to clinical goals, using larger step sizes early on to explore sparing potential and smaller steps for fine-tuning. When faced with difficult-to-achieve sparing objectives, such as for the mandible, the agent intelligently relaxed constraints to preserve target coverage, reasoning that further tightening would not yield significant dose reduction but would increase objective function loss. This adaptive behavior mirrors that of experienced human planners.

Beyond quality, the efficiency gains were substantial. The LLM-driven planning process completed in under 5 minutes on a standard workstation, a significant reduction compared to manual planning times. This research marks a significant step towards generalizable AI-driven planning, particularly for institutions with limited access to large, high-quality training datasets. By embedding the agent directly into a commercial TPS and constraining its actions to parameters human planners use, the approach maximizes clinical applicability and interpretability.

Also Read:

The study underscores that while LLMs possess strong general reasoning, their clinical utility in this domain heavily relies on the quality and interpretation of provided information. This includes understanding clinical constraints as flexible reference points rather than strict targets, and grasping the “hidden rules” of the optimization engine. This zero-shot, LLM-driven workflow offers a generalizable and clinically applicable solution that could reduce planning variability and support broader adoption of AI-based planning strategies in radiotherapy. You can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Automating Radiotherapy Planning with Zero-Shot Large Language Model Agents

Gen AI News and Updates

Contractify Honored as Top Contract Management Solution Provider for 2025 by LegalTech Breakthrough Awards

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates