spot_img
HomeResearch & DevelopmentBuilding a Foundation for Uruguayan Sign Language Translation: The...

Building a Foundation for Uruguayan Sign Language Translation: The iLSU-T Dataset

TLDR: iLSU-T is the first open, large-scale dataset for Uruguayan Sign Language (LSU) translation, comprising over 185 hours of interpreted RGB videos with audio and text from public TV. It addresses the critical need for localized sign language data to develop and evaluate automatic translation systems, providing a baseline for state-of-the-art methods and highlighting challenges in this field.

Automatic sign language translation is a field that has seen significant interest recently, bridging the gap between signers and listeners. However, each country or region often has its unique sign language, requiring specific local data for the development and adaptation of machine translation techniques. Addressing this crucial need, a new open dataset called iLSU-T has been introduced for Uruguayan Sign Language (LSU) translation.

The iLSU-T dataset is a groundbreaking resource, offering over 185 hours of interpreted Uruguayan Sign Language videos. These videos are rich in information, including RGB video, accompanying audio, and text transcriptions. This multimodal and carefully curated data is essential for advancing research in sign language processing, whether for understanding or generating sign language tools.

The data for iLSU-T was sourced from public Uruguayan television channels (Canal 5 and TV Ciudad) and sessions of the Uruguayan Parliament. This diverse origin ensures the dataset covers a wide range of topics and includes interpretations from 18 different professional sign language interpreters. The dataset’s creation involved a meticulous processing pipeline. This included identifying the Region of Interest (RoI) for the interpreter in each video, recognizing the specific signer, automatically generating captions from the audio using advanced speech-to-text models like WhisperX, and then manually aligning these text phrases with the sign language video content. This manual alignment process carefully considered linguistic nuances like pauses and epenthesis (transitional movements between signs) to ensure accuracy. Additionally, the data was labeled with linguistic context categories, such as topics (e.g., weather, politics, sports) and discourse genres (e.g., reports, interviews, legal procedures), further enriching its utility for research.

To establish a baseline and evaluate the dataset’s usefulness, a series of experiments were conducted using three state-of-the-art translation algorithms: Sign Language Transformers (SLT), Stochastic Transformer Networks with Linear Competing Units (STLCU), and Gloss Attention for Gloss-Free Sign Language Translation (GASLT). These methods were tested across different configurations of the iLSU-T data, including the whole dataset and subsets based on the original video sources. The evaluation utilized standard metrics like BLEU-N and ROUGE-L, which measure the similarity between machine-translated text and reference translations, and BERTScore, which assesses semantic similarity.

The experiments revealed varying performance across methods and data subsets, with the data from parliamentary sessions (Source 3) generally yielding the best results. This is partly attributed to the larger volume of data and the presence of duplicate text phrases within its training and test sets, which can aid model learning. The study also explored the robustness of the models to different video frame rates, finding them largely unaffected by minor variations. While the translation results are still a work in progress, they are comparable to those achieved on other large-scale sign language datasets, highlighting the inherent challenges of this task.

The researchers also discussed several limitations. Automatic video clipping and text transcription, particularly punctuation prediction, presented challenges. Furthermore, inherent complexities of sign language, such as sign omission by interpreters, the use of fingerspelling for proper nouns, and coreference resolution (referring to previously introduced objects in signing space), are not fully captured by current models trained on isolated phrases. Despite these challenges, the iLSU-T dataset represents a significant step forward in providing localized, multimodal data for Uruguayan Sign Language translation research.

Also Read:

The iLSU-T dataset is openly accessible for research and educational purposes under a restricted use license, reflecting a collaborative effort between academia and media sources. Future work aims to explore methods based on skeleton data, enrich annotations to include hand, face, or lip activity features, and conduct experiments considering phrase length and text duplication. This ongoing effort is critical for developing novel tools that improve accessibility and inclusion for all individuals. You can find more details about this research in the full paper available at https://arxiv.org/pdf/2507.21104.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -