AdCare-VLM: AI-Powered Video Monitoring for Enhanced Medication Adherence

TLDR: AdCare-VLM is a new AI model based on Video-LLaVA that uses patient videos to monitor medication adherence for chronic diseases like tuberculosis. It was fine-tuned on a private dataset of 806 expert-annotated TB medication videos, creating the LLM-TB-VQA dataset. The model identifies visual cues like face, medication, water, and ingestion to determine adherence patterns. AdCare-VLM outperforms existing vision-language models in accuracy and contextual understanding, offering an automated solution to reduce clinician workload and improve patient outcomes, though it requires more diverse datasets and computational resources for broader implementation.

Medication adherence is a critical factor in managing chronic diseases like diabetes, hypertension, HIV/AIDS, and tuberculosis. Unfortunately, many patients struggle to consistently take their prescribed medications, leading to worsening conditions, increased healthcare costs, and even preventable deaths. Traditional methods of monitoring adherence, such as directly observed therapy (DOT), are often resource-intensive and impractical, especially in remote areas or regions with healthcare worker shortages. While video-assisted directly observed therapy (VOT) offers a more flexible alternative, it still requires extensive manual review of videos by clinicians, which can be time-consuming and prone to human error.

To address these challenges, researchers have developed AdCare-VLM, an innovative artificial intelligence system designed to automate the monitoring of long-term medication adherence using patient videos. This specialized Large Vision Language Model (LVLM) leverages advanced AI to analyze video footage and answer questions related to whether a patient has taken their medication correctly.

How AdCare-VLM Works

AdCare-VLM is built upon a framework called Video-LLaVA, which allows it to understand and process both visual and linguistic information simultaneously. The model is trained to identify key visual cues in patient videos that indicate medication intake. These cues include the clear visibility of the patient’s face, the medication itself, water intake, and the actual act of ingestion. By correlating these visual features with medical concepts, AdCare-VLM can determine adherence patterns.

A crucial aspect of AdCare-VLM’s development involved fine-tuning it on a unique and private dataset. This dataset comprises 806 custom-annotated videos specifically for tuberculosis (TB) medication monitoring. Clinical experts meticulously labeled these videos, categorizing them into positive (medication taken), negative (no medication taken), and ambiguous (unclear adherence) cases. This detailed annotation process created LLM-TB-VQA, a comprehensive medical adherence video question answering dataset.

Key Features and Performance

The AdCare-VLM model integrates images, videos, and text with a robust large language model foundation. It uses a technique called “pre-alignment to projection” to map videos and images into a shared feature space, allowing the AI to learn from a unified visual representation. This means the model can effectively understand the same information, whether it’s presented as text, an image, or a video.

Experimental results show that AdCare-VLM outperforms other parameter-efficient fine-tuning (PEFT) enabled VLM models, such as LLaVA-V1.5 and Chat-UniVi. It demonstrated significant improvements in accuracy across various configurations, including pre-trained, regular, and low-rank adaptation (LoRA) setups. The model particularly excels in contextual and temporal understanding, providing a more nuanced interpretation of patient actions and their environment.

For instance, the model can identify if a patient is holding a pill, drinking water, and swallowing, and then determine if these actions constitute positive adherence. This level of detail helps in automating repetitive monitoring tasks, reducing the workload for healthcare professionals, and potentially improving the quality of care.

Also Read:

Looking Ahead

While AdCare-VLM shows promising results, the researchers acknowledge certain limitations and future directions. The need for more large-scale, open-access annotated datasets, especially in diverse contexts like Africa, is crucial for further advancement. Addressing data distribution disparities and potential biases (gender, socio-economic, cultural) is also essential for equitable and effective implementation. The current model also has moderate capability in understanding very long videos, as it relies on uniformly sampled frames, which might miss intricate details.

Despite these challenges, AdCare-VLM represents a significant step forward in digital health. By leveraging generative AI and vision-language models, it offers a powerful tool for predicting and monitoring medication adherence, ultimately contributing to better health outcomes for patients with chronic diseases. The source code and pre-trained weights for this research will be made accessible for further development and exploration. You can find more details about this research in the full paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AdCare-VLM: AI-Powered Video Monitoring for Enhanced Medication Adherence

How AdCare-VLM Works

Key Features and Performance

Looking Ahead

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates