Building a Foundation for Uruguayan Sign Language Translation: The iLSU-T Dataset

TLDR: iLSU-T is the first open, large-scale dataset for Uruguayan Sign Language (LSU) translation, comprising over 185 hours of interpreted RGB videos with audio and text from public TV. It addresses the critical need for localized sign language data to develop and evaluate automatic translation systems, providing a baseline for state-of-the-art methods and highlighting challenges in this field.

Automatic sign language translation is a field that has seen significant interest recently, bridging the gap between signers and listeners. However, each country or region often has its unique sign language, requiring specific local data for the development and adaptation of machine translation techniques. Addressing this crucial need, a new open dataset called iLSU-T has been introduced for Uruguayan Sign Language (LSU) translation.

The iLSU-T dataset is a groundbreaking resource, offering over 185 hours of interpreted Uruguayan Sign Language videos. These videos are rich in information, including RGB video, accompanying audio, and text transcriptions. This multimodal and carefully curated data is essential for advancing research in sign language processing, whether for understanding or generating sign language tools.

The data for iLSU-T was sourced from public Uruguayan television channels (Canal 5 and TV Ciudad) and sessions of the Uruguayan Parliament. This diverse origin ensures the dataset covers a wide range of topics and includes interpretations from 18 different professional sign language interpreters. The dataset’s creation involved a meticulous processing pipeline. This included identifying the Region of Interest (RoI) for the interpreter in each video, recognizing the specific signer, automatically generating captions from the audio using advanced speech-to-text models like WhisperX, and then manually aligning these text phrases with the sign language video content. This manual alignment process carefully considered linguistic nuances like pauses and epenthesis (transitional movements between signs) to ensure accuracy. Additionally, the data was labeled with linguistic context categories, such as topics (e.g., weather, politics, sports) and discourse genres (e.g., reports, interviews, legal procedures), further enriching its utility for research.

To establish a baseline and evaluate the dataset’s usefulness, a series of experiments were conducted using three state-of-the-art translation algorithms: Sign Language Transformers (SLT), Stochastic Transformer Networks with Linear Competing Units (STLCU), and Gloss Attention for Gloss-Free Sign Language Translation (GASLT). These methods were tested across different configurations of the iLSU-T data, including the whole dataset and subsets based on the original video sources. The evaluation utilized standard metrics like BLEU-N and ROUGE-L, which measure the similarity between machine-translated text and reference translations, and BERTScore, which assesses semantic similarity.

The experiments revealed varying performance across methods and data subsets, with the data from parliamentary sessions (Source 3) generally yielding the best results. This is partly attributed to the larger volume of data and the presence of duplicate text phrases within its training and test sets, which can aid model learning. The study also explored the robustness of the models to different video frame rates, finding them largely unaffected by minor variations. While the translation results are still a work in progress, they are comparable to those achieved on other large-scale sign language datasets, highlighting the inherent challenges of this task.

The researchers also discussed several limitations. Automatic video clipping and text transcription, particularly punctuation prediction, presented challenges. Furthermore, inherent complexities of sign language, such as sign omission by interpreters, the use of fingerspelling for proper nouns, and coreference resolution (referring to previously introduced objects in signing space), are not fully captured by current models trained on isolated phrases. Despite these challenges, the iLSU-T dataset represents a significant step forward in providing localized, multimodal data for Uruguayan Sign Language translation research.

Also Read:

The iLSU-T dataset is openly accessible for research and educational purposes under a restricted use license, reflecting a collaborative effort between academia and media sources. Future work aims to explore methods based on skeleton data, enrich annotations to include hand, face, or lip activity features, and conduct experiments considering phrase length and text duplication. This ongoing effort is critical for developing novel tools that improve accessibility and inclusion for all individuals. You can find more details about this research in the full paper available at https://arxiv.org/pdf/2507.21104.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Building a Foundation for Uruguayan Sign Language Translation: The iLSU-T Dataset

Gen AI News and Updates

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Valorem Reply Earns 2025 Microsoft Inclusion Changemaker Partner of the Year Award for AI-Driven Solutions

Romanian Deep-Tech Startup .lumen Honored with CES 2026 Innovation Award for AI-Powered Glasses for the Blind

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates