EvolveCaptions: A New System for Real-Time Adaptive Captioning for Deaf and Hard of Hearing Users

TLDR: EvolveCaptions is a novel system that improves live captions for Deaf and Hard of Hearing (DHH) individuals by enabling real-time, collaborative adaptation. Hearing participants correct live captions, which then generate short, targeted phrases for DHH speakers to record. These recordings are used to fine-tune the ASR model, leading to significant reductions in Word Error Rate (WER) with minimal effort from DHH users. The system fosters a collaborative approach to accessibility, making ASR more accurate and user-centered.

Automatic Speech Recognition (ASR) systems have become common tools for communication, but they often struggle to accurately transcribe speech from Deaf and Hard of Hearing (DHH) individuals, especially in live conversations. Traditional personalization methods usually demand extensive pre-recorded data and place the entire burden of adaptation on the DHH speaker. This can be time-consuming and doesn’t always reflect real-world interactions.

Introducing EvolveCaptions: A Collaborative Solution

A new system called EvolveCaptions offers a fresh approach to this challenge. It’s a real-time, collaborative ASR adaptation system designed to improve accessibility in mixed-ability conversations. The core idea is to make ASR personalization a shared responsibility, requiring minimal effort from the DHH speaker.

How EvolveCaptions Works

EvolveCaptions operates through a three-stage interactive loop:

1. Live Caption Correction: When a DHH individual speaks, the system transcribes their speech in real time using a Whisper-based ASR engine. Hearing participants in the conversation can then correct any errors they see in the live captions. They can highlight uncertain segments or directly edit incorrect words or phrases. These changes are instantly broadcast to everyone, ensuring immediate clarity.

2. Clause Generation and Recording: Once a caption has been corrected, the system doesn’t just discard that information. Instead, it uses the revised word or phrase to generate a short, natural-sounding clause for the DHH speaker to record. For example, if “fok” was corrected to “fork,” the system might prompt the DHH speaker to record, “She picked up the fork from the table.” This process ensures that the training data is relevant to real speech patterns. DHH users can record these clauses at their convenience, with options to skip, re-record, or delete prompts, making the process flexible and less fatiguing.

3. ASR Fine-Tuning: The recorded clauses are then used to fine-tune the ASR model in the background. This means the system gradually adapts to the DHH speaker’s unique voice over time. The updated model seamlessly replaces the previous one for subsequent captioning, leading to continuous improvement.

Key Benefits and Design Goals

The system is built on three main design goals:

Low-effort personalization: DHH speakers only record short, targeted clauses for words that were previously misrecognized, significantly reducing the time and effort needed for ASR fine-tuning.
In-situ adaptation: Training data is collected during natural conversations, making it more contextually relevant and promoting sustained use.
Collaborative correction: Hearing participants actively assist by correcting captions in real-time, distributing the effort of accessibility.

Evaluation and Promising Results

A study involving 12 DHH and six hearing participants demonstrated the effectiveness of EvolveCaptions. Over five progressive captioning sessions, the system reduced the Word Error Rate (WER) by a median of 27.2% (mean = 30.4%) across all DHH users. This improvement was statistically significant. Participants with higher initial WERs saw the most substantial gains, indicating its particular benefit for users with more atypical speech patterns.

The recording burden on DHH participants was minimal, averaging only about five minutes of speech across the entire study. Hearing participants were highly engaged, making an average of 72.3 caption edits or highlights.

User Experiences

DHH participants expressed optimism about a system that could adapt to their speech, with many feeling that EvolveCaptions gradually learned their voice. They found the recording process worthwhile for improved captions and appreciated the flexibility to skip or re-record prompts. While privacy concerns were raised, most were comfortable with the system given responsible data management.

Hearing participants found the system easy to use and noted that correcting captions, while requiring some cognitive effort, was manageable and became easier with practice. They valued the system’s adaptive nature, feeling that their efforts meaningfully contributed to its learning.

Also Read:

Advancing Communication Accessibility

EvolveCaptions represents a significant step forward in communication accessibility. By integrating real-time correction, targeted data collection, and collaborative interaction, it moves beyond static ASR models and places the technology in a position to adapt to the user, rather than the other way around. This approach aligns with the principle of collective access, where accessibility is a shared responsibility.

The framework could also be generalized to other populations with non-normative speech, such as individuals with dysarthria, stroke survivors, or non-native speakers, expanding accessibility across a wider range of users. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

EvolveCaptions: A New System for Real-Time Adaptive Captioning for Deaf and Hard of Hearing Users

Introducing EvolveCaptions: A Collaborative Solution

How EvolveCaptions Works

Key Benefits and Design Goals

Evaluation and Promising Results

User Experiences

Advancing Communication Accessibility

Gen AI News and Updates

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Valorem Reply Earns 2025 Microsoft Inclusion Changemaker Partner of the Year Award for AI-Driven Solutions

Romanian Deep-Tech Startup .lumen Honored with CES 2026 Innovation Award for AI-Powered Glasses for the Blind

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates