MoMA: A New AI Architecture for Better Clinical Predictions Using Diverse Patient Data

TLDR: MoMA is a new AI architecture that uses multiple specialized large language models (LLMs) to integrate various types of electronic health record (EHR) data (like notes, images, and lab results) for clinical prediction tasks. It converts non-text data into text summaries, combines them with clinical notes, and then uses another LLM to make predictions. This approach outperforms existing methods, is flexible, and requires less training data, making it highly effective for improving clinical decision-making.

Healthcare is constantly evolving, and with the rise of digital patient records, there’s an incredible amount of data available. These electronic health records (EHRs) contain a wealth of information, from doctors’ notes and lab results to medical images and vital signs. Each piece of this data, or “modality,” offers unique insights into a patient’s health. For example, clinical notes describe symptoms, images show anatomy, and lab results quantify physiological states. Combining these different types of data can lead to a much more complete understanding of a patient’s condition, often outperforming models that rely on just one type of information.

However, effectively integrating such diverse data for clinical prediction has been a significant challenge, mainly due to the large data requirements needed to train these complex systems. Traditional methods often struggle to combine these varied data types seamlessly, especially when there isn’t enough perfectly matched, “paired” data across all modalities.

A new approach called Mixture-of-Multimodal-Agents (MoMA) has been introduced to address these challenges. MoMA is a novel architecture that uses multiple large language model (LLM) agents to improve clinical prediction tasks by leveraging multimodal EHR data. Think of it as a team of specialized experts, each handling a different type of information.

How MoMA Works

The MoMA architecture employs a three-tiered system of LLM agents:

Specialist Agents: These agents are designed to convert non-textual data, such as medical images (like X-rays) and laboratory results (which are typically numerical or tabular), into structured textual summaries. For instance, an agent might analyze an X-ray and describe its findings in plain language.
Aggregator Agent: Once the non-textual data is converted into text by the specialist agents, these summaries are combined with existing clinical notes. An aggregator agent then takes all this textual information and generates a single, unified multimodal summary. This step is crucial for bringing all the diverse insights into a coherent narrative.
Predictor Agent: Finally, a third LLM, the predictor agent, uses this unified summary to produce clinical predictions. This could involve predicting the severity of an injury, screening for a condition, or other diagnostic tasks.

One of MoMA’s key advantages is its “plug-and-play” design. This means it can easily incorporate new and improved multimodal LLMs as they become available, without needing to retrain the entire system. Unlike many existing multimodal LLMs that require vast amounts of paired data to learn how different modalities relate, MoMA can perform its non-text to text conversion in a “zero-shot” manner, meaning it doesn’t need extensive prior training for this specific conversion. This significantly reduces the data and computational resources typically required.

Performance and Applications

MoMA was evaluated on real-world datasets for three different clinical prediction tasks: chest trauma severity stratification, multitask chest and spine trauma severity stratification, and unhealthy alcohol use screening. These tasks involved different combinations of data, such as clinical notes with chest radiographs, and clinical notes with lab measurements.

The results showed that MoMA consistently outperformed current state-of-the-art methods across all tasks. For example, in chest trauma severity stratification, MoMA achieved high F1 scores, indicating strong accuracy. It also performed well in predicting unhealthy alcohol use, even against baselines trained on much larger datasets. Importantly, MoMA demonstrated consistent performance across different patient subgroups, including various sex and race groups, which is vital for ensuring equitable healthcare outcomes.

An ablation study, where non-textual inputs were intentionally removed, confirmed that MoMA’s improved performance is indeed due to its effective integration of non-text modalities, not just its ability to understand text. The architecture’s ability to distill complex data into concise, focused summaries also enhances the transparency and interpretability of its predictions, which is highly valuable in clinical settings where understanding the “why” behind a prediction is as important as the prediction itself.

Also Read:

Future Outlook

MoMA represents a significant step forward in using LLMs with multimodal medical data for clinical predictions. Its modular design, computational efficiency, and flexibility make it a promising tool for improving clinical decision-making. While the current interactions between LLM agents are relatively simple, future research could explore more complex communication and coordination among them to further enhance capabilities. The architecture also has the potential to be extended to other applications, such as medical visual question answering.

For those interested in the technical details or reproducing the results, the research paper provides comprehensive information and code availability. You can find more about this innovative architecture at the MoMA research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MoMA: A New AI Architecture for Better Clinical Predictions Using Diverse Patient Data

How MoMA Works

Performance and Applications

Future Outlook

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates