TLDR: A new research paper introduces CLEAR, a pipeline with 57 metrics across lexical, syntactic, semantic, and pragmatic levels, to evaluate how Large Language Models (LLMs) rewrite and improve argumentative texts. The study found that LLMs generally shorten texts (except for very short ones), increase average word length, merge sentences, simplify rhetorical structures, and shift sentiment towards neutrality. Crucially, LLMs consistently enhance both the persuasiveness and coherence of arguments, indicating effective text improvement by making arguments more focused and efficient.
Large Language Models (LLMs) have transformed how we interact with text, excelling in tasks from generating creative content to summarizing complex documents. However, their ability to rewrite and improve existing texts, especially argumentative ones, has been less understood. A new research paper, “CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models,” by Thomas Huber and Christina Niklaus from the University of St. Gallen, Switzerland, sheds light on this crucial area.
Understanding Argument Improvement with LLMs
The paper focuses on a task called Argument Improvement (ArgImp), where LLMs are prompted to enhance the overall quality of argumentative texts. This involves various linguistic modifications across different levels: lexical (word choice), syntactic (sentence structure), semantic (meaning shifts), and pragmatic (rhetorical effectiveness). To systematically evaluate these changes, the researchers developed CLEAR, a comprehensive evaluation pipeline consisting of 57 metrics mapped to these four linguistic levels.
The study utilized several prominent LLMs, including Llama 3.1, Phi-3-mini, Phi-3-medium, and OLMo-7B, applying various prompting techniques across diverse argumentation datasets like Argument Annotated Essays, Microtexts, and ArgRewrite. The goal was to understand not just if LLMs improve arguments, but precisely how they do so at a linguistic level, and whether they exhibit certain biases in the process.
Key Findings from the CLEAR Pipeline
The research revealed several fascinating insights into how LLMs rewrite arguments:
Lexical and Syntactic Transformations
One of the most consistent findings was that LLMs tend to shorten arguments significantly, with text length decreasing by 4.66% to 37.39% across most datasets. The exception was the very short Microtext corpus, where models actually increased text length, suggesting an effort to add detail. Interestingly, while texts became shorter, the average word length increased, and sentences were often merged. This indicates a move towards more concise, information-dense language. Syntactically, models frequently performed ‘merge’ and ‘fusion’ operations, combining original sentences or parts of them, rather than adding entirely new sentences or deleting large sections.
Semantic and Pragmatic Shifts
On the semantic level, LLMs consistently decreased the depth of the Rhetorical Structure Theory (RST) parse tree. A shallower RST tree suggests that the rewritten texts are less complex and easier to understand. This aligns with the observation that models aim for more focused arguments. In terms of sentiment, the models generally shifted English texts towards a more negative (but still overall positive) tone, while German Microtexts became more positive. Overall, the trend was towards a more neutral sentiment.
Perhaps the most encouraging finding was on the pragmatic level: LLMs consistently increased both the persuasiveness and coherence of the arguments across all models and datasets. This suggests that despite the linguistic changes, the models successfully enhanced the overall quality and effectiveness of the argumentative texts.
Bias Analysis and Manual Review
The study also investigated potential biases. No significant length bias was found, meaning LLMs didn’t inherently prefer texts of certain lengths. Regarding positivity bias, the models tended to move texts towards a more neutral tone rather than consistently making them more positive. This indicates a nuanced approach to sentiment rather than a blanket positive shift.
A manual analysis further supported these quantitative findings. It showed that LLMs refine and enhance existing text, often mimicking the original style and even adding structural elements like headlines to paragraphs. However, the models did not appear to check for the logical quality of arguments, sometimes leaving weak points unaddressed.
Also Read:
- The Case for Deliberate AI: Enhancing LLM Judging with Explicit Reasoning
- Unlocking Word Meanings: How Large Language Models Grasp Contextual Senses
Conclusion: More Focused and Efficient Arguments
In essence, the research suggests that LLMs improve arguments by making them more focused and efficient. They achieve this by reducing unnecessary ‘fluff,’ using longer words in shorter sentences, simplifying rhetorical structures, and ultimately enhancing both coherence and persuasiveness. This work provides a valuable framework for understanding the linguistic transformations performed by LLMs in argument rewriting and highlights their potential for enhancing argumentative writing.
For a deeper dive into the methodology and detailed results, you can read the full research paper here.


