Unlocking Information: A New Approach to Easy-to-Read Text with AI

TLDR: A research paper introduces ETR-fr, the first French dataset compliant with European Easy-to-Read (ETR) guidelines, to enable AI models to generate simplified texts for individuals with cognitive impairments. It establishes generative baselines using parameter-efficient fine-tuning on both pre-trained language models (PLMs) and large language models (LLMs). The study found that smaller PLMs, particularly mBARThez with LoRA, performed comparably to and even generalized better than larger LLMs in creating high-quality, accessible texts, especially for new domains like political content.

Ensuring that everyone, including individuals with cognitive impairments, can access and understand written information is crucial for their autonomy and full participation in society. However, the current methods for creating Easy-to-Read (ETR) texts are often slow, expensive, and difficult to scale. This limitation restricts access to vital information in areas like healthcare, education, and civic life.

Artificial intelligence (AI) offers a promising solution to this challenge by enabling the scalable generation of ETR texts. Yet, developing effective AI-driven tools for this purpose comes with its own set of hurdles, such as the scarcity of high-quality datasets, the need for models to adapt to different subject matters, and finding the right balance for efficient learning in large language models (LLMs).

Introducing ETR-fr: A New Dataset for Accessible Text

To address these challenges, a recent research paper introduces ETR-fr, a groundbreaking dataset specifically designed for ETR text generation. This dataset is the first of its kind to be fully compliant with European ETR guidelines, making it a valuable resource for training AI models. ETR-fr comprises 523 pairs of aligned texts, where an original complex text is matched with its simplified ETR version. The dataset was created from a collection of children’s books adapted according to European cognitive accessibility guidelines, ensuring high quality and relevance.

The ETR framework itself emphasizes several key principles for creating accessible texts: using clear and simple language, providing concrete examples and analogies, structuring content logically with headings and bullet points, offering accessible content with summaries and definitions, and incorporating relevant visuals and illustrations. Manual ETR transcription typically involves an iterative collaboration between human experts and individuals with cognitive impairments to ensure content validity.

Developing and Evaluating AI Models

The researchers implemented parameter-efficient fine-tuning (PEFT) techniques on both pre-trained language models (PLMs) like mBART and mBARThez, and larger language models (LLMs) such as Mistral-7B and Llama-2-7B. PEFT methods, including prefix-tuning and Low-Rank Adaptation (LoRA), allow for efficient adaptation of these models by only fine-tuning a small subset of parameters, which helps reduce computational costs and prevent forgetting previously learned knowledge.

To ensure the generated texts are of high quality and truly accessible, a comprehensive evaluation framework was developed. This framework combines automatic metrics commonly used in text simplification and summarization (like ROUGE, BERTScore, and SARI) with a rigorous human assessment. The human evaluation was conducted by linguist-experts using a detailed 36-question form aligned with European ETR guidelines, focusing on aspects like Information Choices, Sentence Construction, and Word Choice.

Key Findings: Smaller Models Show Strong Generalization

The study yielded some remarkable insights. Quantitative results on the ETR-fr dataset showed that PEFT methods generally outperformed full fine-tuning. Notably, the smaller PLM, mBARThez, particularly when combined with LoRA, achieved the best overall performance across several automatic metrics, including ROUGE and BERTScore. It also demonstrated excellent readability scores (KMRE) and compression ratios, indicating its ability to effectively simplify and summarize texts.

A critical aspect of the research involved testing the models’ ability to generalize to out-of-domain texts. For this, a separate test set called ETR-fr-politic was created, consisting of political election texts—a domain not included in the training data. On this challenging out-of-domain set, mBARThez with LoRA again emerged as the top performer, showcasing superior generalization capabilities compared to the larger LLMs. The LLMs, like Mistral-7B, appeared to overfit to the training data, struggling more with new domains.

The manual qualitative evaluation by linguist-experts further supported these findings. While Mistral-7B+LoRA performed well on the in-domain ETR-fr test set for certain criteria, mBARThez+LoRA demonstrated better generalization and overall perceived quality in the out-of-domain political texts. This suggests that lightweight approaches can be highly effective and stable for ETR generation.

Also Read:

Looking Ahead

This research highlights that ETR generation is a distinct task from traditional text simplification or summarization, requiring a focused approach on cognitive accessibility. The introduction of the ETR-fr dataset and the empirical study provide a strong foundation for future advancements in this field. Future work may involve developing specific evaluation metrics for ETR, improving inter-annotator agreement in human evaluations, and exploring reinforcement learning from human feedback (RLHF) to align model outputs even more closely with user preferences, potentially paving the way for automated ETR labeling.

For more detailed information, you can read the full research paper here: Inclusive Easy-to-Read Text Generation for Individuals with Cognitive Impairments.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unlocking Information: A New Approach to Easy-to-Read Text with AI

Introducing ETR-fr: A New Dataset for Accessible Text

Developing and Evaluating AI Models

Key Findings: Smaller Models Show Strong Generalization

Looking Ahead

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates