The 80% Challenge: Unpacking the Real Work of Deploying AI in Clinical Settings

TLDR: A field guide based on deploying an AI agent in healthcare reveals that 80% of effort goes into implementation challenges like data integration, validation, economics, drift management, and governance, rather than just model development. This highlights the sociotechnical nature of bringing AI into clinical practice.

Large language models (LLMs) and AI agents hold immense promise for transforming healthcare, but bringing them into routine clinical practice is far more complex than just developing the algorithms. A recent field guide, informed by the deployment of an AI agent at Mass General Brigham, sheds light on the significant implementation challenges that consume the majority of effort in real-world scenarios.

The paper, titled “Beyond the Algorithm: A Field Guide to Deploying AI Agents in Clinical Practice”, highlights a crucial finding: less than 20% of the total project time was spent on prompt engineering and model development. The overwhelming 80% was dedicated to the ‘sociotechnical work’ of implementation, including data engineering, stakeholder alignment, regulatory navigation, and workflow integration. This imbalance reveals a misalignment between where the field often focuses its attention (algorithms) and where success is truly determined (infrastructure and implementation).

The research team, led by authors like Jack Gallifant, Katherine C. Kellogg, and Danielle S. Bitterman, developed and deployed an AI agent called “irAE-Agent.” This system automatically detects immune-related adverse events (irAEs) from clinical notes of cancer patients, aiming to assist with timely registration to an irAE biobank. Their experience, combined with structured interviews with 20 clinicians, engineers, and informatics leaders, identified five key areas, or “heavy lifts,” that are critical for successful deployment.

Also Read:

The Five Heavy Lifts of AI Deployment in Clinical Practice

1. Data Integration: This is often the most significant engineering challenge. It involves securely and efficiently integrating vast amounts of electronic health record (EHR) data, which is mostly free-text, into AI workflows. Unlike traditional machine learning that uses structured data, LLMs require new preprocessing steps like text chunking and creating semantic layers. The key lesson here is to invest early in a centralized data warehouse and start with batch processing before considering costly real-time solutions.

2. Model Validation and Refinement: Beyond standard retrospective testing, LLM validation is a continuous process. It requires extensive human annotation, systematic evaluation of model outputs for failure modes, hallucination assessment, and evidence verification. The paper emphasizes the need for strict annotation guidelines, dual annotation with physician adjudication, and incremental rollouts to mitigate risks and gather real-world feedback.

3. Ensuring Economic Value: The long-term adoption of any clinical AI tool hinges on its credible economic value. This involves mapping use cases to institutional priorities like revenue preservation, labor productivity, and quality improvement. The authors found that labor substitution is rarely linear; instead, the focus should be on redeployment and task-specific productivity. Cost tracking and demonstrating unit economics improvement with scale are vital.

4. Managing Model and Data Drift: AI models, especially LLMs, are not static. They can experience ‘drift’ (behavioral changes without altering the model version) or ‘shift’ (changes due to external alterations). Continuous monitoring, weekly re-scoring against gold-labeled test sets, and tracking API version changes are crucial. Human validation remains indispensable for catching nuances that automated dashboards might miss, and prompt engineering often becomes the primary remediation strategy.

5. Governance: Deploying AI in patient care raises complex ethical, accountability, and regulatory questions. Establishing a multidisciplinary AI governance board with representatives from clinical, legal, security, and patient experience teams is essential. Clear lifecycle checkpoints and a Responsibility, Accountability, Consultation, and Information (RACI) matrix help define roles and accelerate approvals. Special attention is needed for prompt engineering and potential privacy risks, as well as continuous red-teaming against ‘jailbreak’ attempts.

The experience with the irAE-Agent demonstrates that successfully integrating generative AI into clinical practice is fundamentally a sociotechnical challenge. It requires aligning diverse stakeholders, building trust through continuous validation, and focusing on the essential infrastructure and implementation work. This practical roadmap aims to help other institutions bridge the “valley of death” and translate generative AI from pilot projects into routine clinical care. You can read the full paper for more details here: Beyond the Algorithm: A Field Guide to Deploying AI Agents in Clinical Practice.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

The 80% Challenge: Unpacking the Real Work of Deploying AI in Clinical Settings

The Five Heavy Lifts of AI Deployment in Clinical Practice

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates