spot_img
HomeResearch & DevelopmentEvolveCaptions: A New System for Real-Time Adaptive Captioning for...

EvolveCaptions: A New System for Real-Time Adaptive Captioning for Deaf and Hard of Hearing Users

TLDR: EvolveCaptions is a novel system that improves live captions for Deaf and Hard of Hearing (DHH) individuals by enabling real-time, collaborative adaptation. Hearing participants correct live captions, which then generate short, targeted phrases for DHH speakers to record. These recordings are used to fine-tune the ASR model, leading to significant reductions in Word Error Rate (WER) with minimal effort from DHH users. The system fosters a collaborative approach to accessibility, making ASR more accurate and user-centered.

Automatic Speech Recognition (ASR) systems have become common tools for communication, but they often struggle to accurately transcribe speech from Deaf and Hard of Hearing (DHH) individuals, especially in live conversations. Traditional personalization methods usually demand extensive pre-recorded data and place the entire burden of adaptation on the DHH speaker. This can be time-consuming and doesn’t always reflect real-world interactions.

Introducing EvolveCaptions: A Collaborative Solution

A new system called EvolveCaptions offers a fresh approach to this challenge. It’s a real-time, collaborative ASR adaptation system designed to improve accessibility in mixed-ability conversations. The core idea is to make ASR personalization a shared responsibility, requiring minimal effort from the DHH speaker.

How EvolveCaptions Works

EvolveCaptions operates through a three-stage interactive loop:

1. Live Caption Correction: When a DHH individual speaks, the system transcribes their speech in real time using a Whisper-based ASR engine. Hearing participants in the conversation can then correct any errors they see in the live captions. They can highlight uncertain segments or directly edit incorrect words or phrases. These changes are instantly broadcast to everyone, ensuring immediate clarity.

2. Clause Generation and Recording: Once a caption has been corrected, the system doesn’t just discard that information. Instead, it uses the revised word or phrase to generate a short, natural-sounding clause for the DHH speaker to record. For example, if “fok” was corrected to “fork,” the system might prompt the DHH speaker to record, “She picked up the fork from the table.” This process ensures that the training data is relevant to real speech patterns. DHH users can record these clauses at their convenience, with options to skip, re-record, or delete prompts, making the process flexible and less fatiguing.

3. ASR Fine-Tuning: The recorded clauses are then used to fine-tune the ASR model in the background. This means the system gradually adapts to the DHH speaker’s unique voice over time. The updated model seamlessly replaces the previous one for subsequent captioning, leading to continuous improvement.

Key Benefits and Design Goals

The system is built on three main design goals:

  • Low-effort personalization: DHH speakers only record short, targeted clauses for words that were previously misrecognized, significantly reducing the time and effort needed for ASR fine-tuning.
  • In-situ adaptation: Training data is collected during natural conversations, making it more contextually relevant and promoting sustained use.
  • Collaborative correction: Hearing participants actively assist by correcting captions in real-time, distributing the effort of accessibility.

Evaluation and Promising Results

A study involving 12 DHH and six hearing participants demonstrated the effectiveness of EvolveCaptions. Over five progressive captioning sessions, the system reduced the Word Error Rate (WER) by a median of 27.2% (mean = 30.4%) across all DHH users. This improvement was statistically significant. Participants with higher initial WERs saw the most substantial gains, indicating its particular benefit for users with more atypical speech patterns.

The recording burden on DHH participants was minimal, averaging only about five minutes of speech across the entire study. Hearing participants were highly engaged, making an average of 72.3 caption edits or highlights.

User Experiences

DHH participants expressed optimism about a system that could adapt to their speech, with many feeling that EvolveCaptions gradually learned their voice. They found the recording process worthwhile for improved captions and appreciated the flexibility to skip or re-record prompts. While privacy concerns were raised, most were comfortable with the system given responsible data management.

Hearing participants found the system easy to use and noted that correcting captions, while requiring some cognitive effort, was manageable and became easier with practice. They valued the system’s adaptive nature, feeling that their efforts meaningfully contributed to its learning.

Also Read:

Advancing Communication Accessibility

EvolveCaptions represents a significant step forward in communication accessibility. By integrating real-time correction, targeted data collection, and collaborative interaction, it moves beyond static ASR models and places the technology in a position to adapt to the user, rather than the other way around. This approach aligns with the principle of collective access, where accessibility is a shared responsibility.

The framework could also be generalized to other populations with non-normative speech, such as individuals with dysarthria, stroke survivors, or non-native speakers, expanding accessibility across a wider range of users. For more details, you can read the full research paper here.

Rhea Bhattacharya
Rhea Bhattacharyahttps://blogs.edgentiq.com
Rhea Bhattacharya is an AI correspondent with a keen eye for cultural, social, and ethical trends in Generative AI. With a background in sociology and digital ethics, she delivers high-context stories that explore the intersection of AI with everyday lives, governance, and global equity. Her news coverage is analytical, human-centric, and always ahead of the curve. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -