spot_img
HomeResearch & DevelopmentUnmasking the Pro-Human Bias in Literary Style Evaluation: A...

Unmasking the Pro-Human Bias in Literary Style Evaluation: A Study on AI and Human Judges

TLDR: A study by Princeton University researchers Wouter Haverals and Meredith Martin reveals a significant pro-human attribution bias in literary style evaluation, not only among human participants but also, and more strongly, among AI models. Using Raymond Queneau’s *Exercises in Style*, the research found that both humans and AI models devalue creative content when it’s labeled as AI-generated, even if the content is identical to human-authored text. AI models exhibited a 2.5-fold stronger bias than humans, often inverting their assessment criteria based solely on perceived authorship, suggesting they’ve absorbed cultural biases against artificial creativity during training.

A recent study delves into a fascinating aspect of how we perceive creative writing: the impact of knowing whether a text was written by a human or an artificial intelligence. The research, titled “Everyone Prefers Human Writers, Including AI,” conducted by Wouter Haverals and Meredith Martin from Princeton University, reveals a systematic bias favoring human authorship, not just among people, but surprisingly, even among AI models themselves.

The Experiment: Unveiling Attribution Bias

To explore this phenomenon, the researchers designed controlled experiments using Raymond Queneau’s classic work, *Exercises in Style* (1947). This book retells the same mundane narrative in ninety-nine different stylistic variations, providing an ideal corpus for comparing evaluations of stylistic quality. The study involved two main parts.

In Study 1, 556 human participants and 13 different AI models were asked to evaluate literary passages. These passages were either original Queneau texts or versions generated by GPT-4. The evaluators assessed these texts under three conditions: blind (no authorship information), accurately labeled (correctly identified as human or AI), and counterfactually labeled (labels deliberately reversed, so AI content was presented as human and vice versa).

Study 2 expanded on this by testing bias generalization across a larger matrix. Here, 14 AI evaluator models judged content created by each of 14 different AI creators, again under the three labeling conditions. This comprehensive approach aimed to see if the bias was a general property of AI evaluation, regardless of which AI created the content.

Key Findings: A Strong Pro-Human Preference

The results were striking. Both humans and AI models exhibited a systematic pro-human attribution bias. Human participants showed a +13.7 percentage point bias, meaning they preferred content labeled as human-authored significantly more than when it was labeled as AI-generated, even if the content was identical. However, AI models displayed an even stronger bias, with a +34.3 percentage point effect, which is 2.5 times greater than that observed in humans.

When AI models evaluated content blindly (without knowing the author), their preferences were balanced, suggesting they perceived the quality of human and AI writing as comparable. But when the content was correctly labeled as AI-generated, their preference for their own creations plummeted. Conversely, when AI-generated content was falsely labeled as human-authored, their preference for it surged. This indicates that AI systems systematically devalue creative content when they believe it was created by another AI, regardless of its actual quality.

This bias was not an anomaly of a few models; it was universal across all 13 individual AI models tested in Study 1, with all models exceeding the human baseline bias. Study 2 further confirmed that this bias operates consistently across different AI architectures, demonstrating it as a fundamental property of AI evaluation.

Changing the Narrative: How Labels Alter Judgment

Beyond just measuring preferences, the study also analyzed the rationales provided by AI models for their choices. These explanations revealed that attribution labels caused AI evaluators to invert their assessment criteria. Identical textual features received opposing evaluations based solely on perceived authorship.

For example, in a “Lipogram” exercise (where the letter ‘e’ is omitted), GPT-4’s version contained errors. When AI models believed a human wrote this imperfect text, they showed surprising leniency, even rationalizing the errors as subtle or maintaining readability. However, when the same text was correctly labeled as AI-generated, they were much stricter. Similarly, in styles like “Cockney” dialect, AI models redefined what constituted “authentic” performance based on whether the text was attributed to a human or an AI, praising human-labeled content for qualities they dismissed in AI-labeled content.

This suggests that AI models have absorbed human cultural biases against artificial creativity during their training. They learn to defer to human preferences, effectively installing a “learned reliability prior” where human judgments are considered the gold standard. This can lead to a form of “sycophancy,” where models echo expected user attitudes rather than providing independent assessments.

Also Read:

Implications for AI and Creativity

The research highlights that attribution tags act as “paratexts” – thresholds that set expectations before a text is even read. For both humans and AI, these cues steer attention and preselect which features will count as evidence. Humans, for instance, took measurably longer to make judgments when no authorship information was provided, indicating the cognitive shortcut that labels provide.

This study confirms a “pro-human bias” in the literary domain, aligning with the concept of “algorithm aversion” in subjective fields. Cultural assumptions often position machines as functional but lacking emotional depth and creative agency, qualities highly valued in authentic stylistic expression. The AI models, being preference-trained evaluators, amplify this human tendency, acting as a mirror reflecting a cultural script that privileges human provenance as a hallmark of creativity.

The findings suggest that developing artificial aesthetic intelligence may depend less on teaching machines to evaluate and more on understanding the cultural values and biases they have already absorbed. For more details, you can read the full research paper here: Everyone Prefers Human Writers, Including AI.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -