TLDR: A study investigated how LLM-generated explanations (rationales) influence human and LLM judgments of common-sense plausibility. It found that both humans and LLMs are significantly swayed by these rationales, with “pro” arguments generally increasing plausibility and “con” arguments decreasing it. Notably, humans sometimes reacted differently than LLMs, especially for highly plausible “gold” answers, where pro rationales surprisingly lowered human ratings. The research highlights LLMs’ persuasive potential, raising both opportunities for human-AI collaboration and concerns about opinion manipulation.
A recent study delves into a fascinating question: how much can explanations generated by large language models (LLMs) sway our own common-sense judgments? The paper, titled Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility, explores the subtle yet significant influence AI-generated arguments can have on both human and other AI systems’ perceptions of what is plausible.
Authored by Shramay Palta, Peter Rankel, Sarah Wiegreffe, and Rachel Rudinger from the University of Maryland, College Park, this research highlights a novel use of LLMs for studying human cognition while also raising important practical concerns about the potential for AI to shape our beliefs, even in areas where we consider ourselves experts.
Understanding Plausibility and Rationales
Common-sense reasoning tasks often involve evaluating scenarios where answers aren’t strictly true or false, but rather fall on a spectrum of plausibility. For instance, if a person drops a glass, it’s highly plausible it will break, but technically possible it might bounce if dropped on a rubber mat. The study investigates how arguments, or ‘rationales,’ for or against an answer’s plausibility can shift our perception, even if these arguments don’t introduce new facts but merely highlight possible circumstances.
The researchers focused on two common-sense multiple-choice benchmarks, Social IQA (SIQA) and CommonsenseQA (CQA). They took question-answer pairs and generated two types of rationales using an LLM (specifically GPT-4o): PRO rationales, which argued for the answer’s plausibility, and CON rationales, which argued against it. They also created a PRO+CON setting where both arguments were presented.
Human Reactions to AI Explanations
The study collected 3,000 plausibility judgments from human annotators, who rated answers on a 1-5 Likert scale (1-Impossible, 5-Very Likely) under four conditions: no rationale, PRO rationale, CON rationale, and PRO+CON rationales. The findings revealed that human judgments were indeed significantly affected by the presence of these AI-generated explanations.
Generally, PRO rationales tended to increase human plausibility ratings, while CON rationales lowered them. However, a particularly intriguing observation emerged for ‘gold’ (correct) answer choices: when a PRO rationale was presented, human ratings surprisingly *dropped*. The researchers suggest this might be because for already highly plausible answers, a ‘plausibility argument’ might inadvertently ‘undersell’ the actual likelihood, making it seem less certain. Conversely, for ‘distractor’ (incorrect) answers, PRO rationales successfully raised ratings.
CON rationales, on the other hand, consistently lowered ratings for both gold and distractor answers, with a particularly strong impact on gold answers, sometimes causing a drop of over a full Likert scale point. When both PRO and CON rationales were presented, human ratings often settled somewhere between the effects of individual PRO and CON rationales, suggesting a balancing act.
LLMs Also Swayed by Rationales
To understand if AI models exhibit similar patterns, the researchers replicated the human experiment with 17 different LLMs, collecting an additional 13,600 judgments. These models were divided into OpenAI and Non-OpenAI groups. The results showed that LLMs were also highly sensitive to the rationales.
Similar to humans, PRO rationales generally increased LLM ratings, and CON rationales decreased them. However, a key difference from human behavior was observed: for gold answer choices, PRO rationales consistently *increased* LLM plausibility ratings, directly contrasting the human response. This highlights a divergence in how humans and LLMs process and react to supporting arguments for highly plausible statements.
OpenAI models, which included the model used to generate the rationales (GPT-4o), showed a higher sensitivity to these explanations, possibly due to a self-preference bias.
Why Do Ratings Change?
The study also investigated the factors contributing to these shifts. A strong ‘anchoring effect’ was identified: the initial plausibility rating of an answer had a significant impact on how much it would change. Higher initial ratings led to smaller subsequent changes, meaning it’s harder to shift an already strong opinion. This effect was even more pronounced for distractor answers.
Furthermore, CON rationales were found to have a stronger negative effect than PRO rationales had a positive effect, indicating that arguments against plausibility are often more potent than arguments for it. The length of the rationale, however, had only a weak relationship with rating changes.
Also Read:
- Evaluating Collaborative Reasoning in Language Models: A Deep Dive into Off-Trajectory Thinking
- The Unexpected Truth About Prompting LLMs for Consistent Evaluations
Implications for Human-AI Interaction
The findings underscore the persuasive power of LLM-generated explanations. While this capability could be harnessed for positive human-AI collaboration—such as challenging users’ reasoning, stress-testing arguments, or introducing alternative perspectives—it also raises significant concerns. The ability of LLMs to shape opinions, even in common-sense domains, could potentially undermine human autonomy and informed decision-making. The authors emphasize the need for robust safeguards, including transparency, bias mitigation, and mechanisms to detect and counteract manipulative uses of AI.
The research also acknowledges limitations, such as its focus on English-language common-sense reasoning and a specific demographic of annotators, suggesting that cultural and linguistic differences could lead to varied impacts of rationales.


