spot_img
HomeResearch & DevelopmentSimplifying Spanish: How Large Language Models Improve Text Readability

Simplifying Spanish: How Large Language Models Improve Text Readability

TLDR: The CardiffNLP team participated in the CLEARS-2025 shared task, focusing on simplifying Spanish texts into Plain Language (PL) and Easy-to-Read (E2R) formats using Large Language Models (LLMs). They experimented with LLaMA-3.2 and Gemma-3, finding Gemma-3 to be more effective, especially when prompted in Spanish. Key to their success were structured output (Python dictionary) and sentence-level processing. They secured third place in PL and second in E2R, highlighting LLMs’ potential while also noting the limitations of current automatic evaluation metrics for nuanced text simplification.

Ensuring information is clear and easy to understand is a fundamental right, as highlighted by the Universal Declaration for Human Rights. However, many public and official documents, especially in fields like law and medicine, remain inaccessible to a significant portion of the population due to their complex language. To address this challenge, the CLEARS shared task at IberLEF-2025 focused on automatically adapting Spanish texts into two accessible formats: Plain Language (PL) and Easy-to-Read (E2R).

Plain Language (PL) aims to make texts clear and concise for general audiences, including non-native speakers and individuals with reading limitations. It emphasizes active voice, common words, and avoids jargon. Easy-to-Read (E2R), on the other hand, is specifically designed for people with cognitive, intellectual, or learning disabilities. It focuses on structural and linguistic simplicity, using short sentences, clear language, and often involves the target audience in testing for clarity. Traditionally, creating these adapted texts is a manual and resource-intensive process, requiring experts and user validation.

CardiffNLP’s Approach to Text Simplification

The CardiffNLP team, from Cardiff University, contributed to the CLEARS shared task by exploring a novel approach: leveraging Large Language Models (LLMs) for automatic text adaptation. Their work, detailed in their paper “Prompting Large Language Models for Plain Language and Easy-to-Read Text Rewriting”, involved experimenting with different prompting methods, including zero-shot, one-shot, and few-shot strategies.

Initially, the team experimented with LLaMA-3.2, but for their final submission, they adopted Gemma-3. Their experiments involved numerous prompt variations, testing the effectiveness of different instructions and even the language in which these instructions were given (English or Spanish). A key finding was that instructing the model to return its output as a Python dictionary significantly improved results and made extraction easier. Furthermore, explicitly guiding the model to read and work on sentences individually also boosted similarity scores.

Performance and Insights

The CardiffNLP team achieved notable success in the CLEARS shared task, securing third place in Subtask 1 (Plain Language) and second place in Subtask 2 (Easy-to-Read). Their results highlighted the potential of LLMs in text simplification, particularly Gemma-3, which consistently performed as well as or better than LLaMA-3.2. Interestingly, Gemma-3 showed superior performance when prompted in Spanish, as English prompts sometimes led the model to simplify texts in English.

Despite the promising results, the research also shed light on ongoing challenges. The team noted that current automatic metrics do not fully capture the nuances of text simplification, especially for E2R, where visual formatting and sentence segmentation are crucial for readability. This suggests a need for more sophisticated evaluation methods that can account for these qualitative aspects.

Also Read:

Looking Ahead

The CardiffNLP team’s contribution to the CLEARS shared task has deepened the understanding of LLMs’ capabilities and limitations in text simplification. Their work underscores the importance of carefully crafted prompts, structured output formats, and sentence-level processing in mitigating common LLM errors like hallucinations and inconsistent formatting. Future work will benefit from incorporating human evaluation and developing metrics that better reflect the complex qualitative aspects of simplification, especially for specific target groups requiring intricate formatting.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -