TLDR: Researchers from OUNLP developed two multi-round text simplification systems, MRS-Rule and MRS-Joint, generated by GPT-4o, for the TSAR 2025 Shared Task. They found that a larger gap between a text’s original and target CEFR readability levels makes simplification harder. Their MRS-Joint method, which combines initial LLM simplification with subsequent rule-based refinements, significantly improved CEFR level accuracy and meaning preservation compared to single-step approaches, demonstrating the effectiveness of iterative simplification and AI-generated code.
Making complex text easier to understand is a crucial task, especially for language learners and individuals with limited literacy. This challenge was at the heart of the TSAR 2025 Shared Task, where the OUNLP team from the University of Oklahoma presented an innovative approach to text simplification using multi-round methods and AI-generated code. Their research highlights a significant finding: the greater the difference between a text’s original difficulty and its target readability level, the harder it is to simplify effectively. This gap, referred to as the “CEFR-Gap,” became the driving force behind their novel multi-step simplification strategies.
The Challenge of Text Simplification
Traditional methods often attempt to simplify text in a single step. However, the OUNLP team observed that when the required simplification is substantial – for instance, transforming a highly advanced (C1) text into a basic (A2) one – single-step approaches frequently fall short. This difficulty arises because larger CEFR-Gaps demand more radical linguistic changes, which can compromise both the accuracy of reaching the target readability level and the preservation of the original meaning. This insight led them to propose an iterative, multi-round process, believing that breaking down the simplification into smaller, manageable steps would yield better results.
Introducing Multi-Round Simplification
The OUNLP team developed two primary multi-round simplification methods, with their underlying code generated by GPT-4o, a powerful large language model:
MRS-Rule: This method is purely rule-based. It doesn’t rely on external large language model APIs for simplification itself, but instead uses a set of predefined rules to progressively adjust sentence structures and vocabulary. These rules include replacing complex words with simpler synonyms, standardizing numerical expressions, removing non-essential clauses, and breaking down long sentences. The system iteratively applies these rules, checking the text’s readability and semantic similarity to the original after each round, until the target CEFR level is met or the best possible simplification is achieved.
MRS-Joint: Building on the MRS-Rule, this method combines the strengths of both rule-based approaches and large language model prompting. In the initial step, an LLM (specifically GPT-4o-mini) generates a first simplified version of the text. Subsequent rounds then employ the same rule-based iterative refinements as MRS-Rule. This hybrid approach aims to leverage the generative power of LLMs for initial simplification while using structured rules for fine-grained, controlled adjustments in later stages.
Measuring Success: CEFR Levels and Meaning Preservation
To evaluate their systems, the researchers focused on two key aspects: CEFR Compliance (how close the simplified text is to the target CEFR level, measured by RMSE – Root Mean Square Error, where lower is better) and Meaning Preservation (how well the simplified text retains the original meaning, measured by MeaningBERT scores, where higher is better). They used three ModernBERT classifiers to predict CEFR levels and SBERT for semantic similarity checks.
Also Read:
- Uncovering LLM Faults: A Deep Dive into Metamorphic Testing for Natural Language Processing
- Unlocking Trust: How to Improve Large Language Models’ Self-Confidence in Code Reasoning
Promising Results and Key Takeaways
The experiments showed that the MRS-Joint method significantly outperformed both the initial LLM-prompting baseline and the MRS-Rule method. MRS-Joint achieved the best CEFR accuracy (lowest RMSE) while still effectively preserving the original meaning. This demonstrates that multi-round simplification is indeed more effective at handling large CEFR-Gaps than conventional single-step approaches. Furthermore, starting the simplification process with an LLM-generated candidate, as done in MRS-Joint, further boosted the overall performance of the multi-round system.
While the systems excelled at simplifying complex sentences (C1-C2) to an intermediate B1 level, the team noted that simplifying to very low levels like A2 remained challenging, sometimes resulting in text that was still more complex than intended. This “overshooting” or other issues like “lexical imitation” (keeping formal phrases) and “under-generation” (producing incomplete sentences) highlight areas for future refinement.
In conclusion, the OUNLP team’s work provides compelling evidence that an iterative, multi-round approach to text simplification, especially when augmented by AI-generated code and a combination of LLM prompting and rule-based refinements, can significantly improve readability and accessibility. This research paves the way for more effective tools to make information accessible to a wider audience. You can find more details about their work in the full research paper: OUNLP at TSAR 2025 Shared Task: Multi-Round Text Simplifier via Code Generation.


