Enhancing French Electronic Health Records with AI for Social Determinants of Health

TLDR: A new study utilized a large language model (Flan-T5-Large) to extract 13 social determinants of health (SDoH) from French clinical notes. The model demonstrated strong performance for well-documented SDoH categories and significantly outperformed traditional structured EHR data, identifying SDoH in 95.8% of patients compared to 2.8% via ICD-10 codes. While effective on social history sections, its performance on full clinical notes was lower, indicating areas for future improvement in generalization and language-specific NLP tools to enhance SDoH documentation and address health disparities.

Social determinants of health (SDoH) are crucial factors that significantly influence an individual’s health outcomes, affecting everything from disease progression to how well treatments work and contributing to health disparities. However, capturing this vital information in structured electronic health records (EHRs) is often incomplete or missing. This gap makes it challenging to understand the full picture of a patient’s health and to address broader health inequalities.

A recent study tackles this challenge by proposing an innovative approach using large language models (LLMs) to extract 13 specific SDoH categories from French clinical notes. This is particularly significant because most existing research and tools for SDoH extraction using natural language processing (NLP) have focused on the English language, leaving a considerable void for other languages like French.

The researchers trained a model called Flan-T5-Large on annotated social history sections from clinical notes collected at Nantes University Hospital in France. The 13 SDoH categories targeted for extraction included living condition, marital status, descendants, employment status, occupation, tobacco use, alcohol use, drug use, housing, education, physical activity, income, and ethnicity/country of birth. The study evaluated the model’s performance at two levels: first, identifying SDoH categories and their associated values, and second, extracting detailed SDoH information, including temporal and quantitative data.

The model demonstrated strong performance in identifying well-documented SDoH categories such as living condition, marital status, descendants, job, and tobacco and alcohol use, achieving F1 scores above 0.80. This indicates its effectiveness in recognizing these common and consistently documented factors. However, performance was lower for categories like employment status, housing, physical activity, income, and education. The researchers attributed this to limited training data for these categories and the highly variable ways in which they are expressed in clinical notes.

One of the most compelling findings of the study was the comparison between the LLM’s extraction capabilities and traditional structured EHR data. The model successfully identified at least one SDoH for 95.8% of patients, a stark contrast to only 2.8% identified using ICD-10 codes from structured EHR data. This highlights the immense value of leveraging unstructured clinical notes, which often contain richer and more detailed SDoH information than coded fields.

The study also shed light on some limitations. The model, trained exclusively on social history sections, showed a significant drop in performance when applied to full clinical notes. This suggests that while effective for specific sections, its generalization to broader clinical text needs further development. Errors were also linked to inconsistencies in human annotation, the reliance on an English-centric tokenizer that struggled with French characters, and the inherent challenges of converting complex natural language into a structured output format.

Also Read:

Despite these challenges, the research underscores the potential of NLP in improving the completeness of real-world SDoH data in non-English EHR systems. By making two of their four datasets publicly available, the researchers aim to foster further development and reproducibility in French SDoH extraction. Future work will focus on data augmentation, using synthetic clinical text, and releasing the model itself to support multilingual SDoH research. Ultimately, advancing automated SDoH extraction from unstructured clinical text can lead to more equitable healthcare by providing richer, more representative data for research, policy-making, and targeted public health interventions. For more details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing French Electronic Health Records with AI for Social Determinants of Health

Gen AI News and Updates

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

A New Benchmark for Evaluating AI in Electronic Health Records: Introducing EHRStruct

New AI Approaches Improve Medication Recommendations for Metabolic Diseases in China

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates