When AI Explanations Shift Our Common Sense

TLDR: A study investigated how LLM-generated explanations (rationales) influence human and LLM judgments of common-sense plausibility. It found that both humans and LLMs are significantly swayed by these rationales, with “pro” arguments generally increasing plausibility and “con” arguments decreasing it. Notably, humans sometimes reacted differently than LLMs, especially for highly plausible “gold” answers, where pro rationales surprisingly lowered human ratings. The research highlights LLMs’ persuasive potential, raising both opportunities for human-AI collaboration and concerns about opinion manipulation.

A recent study delves into a fascinating question: how much can explanations generated by large language models (LLMs) sway our own common-sense judgments? The paper, titled Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility, explores the subtle yet significant influence AI-generated arguments can have on both human and other AI systems’ perceptions of what is plausible.

Authored by Shramay Palta, Peter Rankel, Sarah Wiegreffe, and Rachel Rudinger from the University of Maryland, College Park, this research highlights a novel use of LLMs for studying human cognition while also raising important practical concerns about the potential for AI to shape our beliefs, even in areas where we consider ourselves experts.

Understanding Plausibility and Rationales

Common-sense reasoning tasks often involve evaluating scenarios where answers aren’t strictly true or false, but rather fall on a spectrum of plausibility. For instance, if a person drops a glass, it’s highly plausible it will break, but technically possible it might bounce if dropped on a rubber mat. The study investigates how arguments, or ‘rationales,’ for or against an answer’s plausibility can shift our perception, even if these arguments don’t introduce new facts but merely highlight possible circumstances.

The researchers focused on two common-sense multiple-choice benchmarks, Social IQA (SIQA) and CommonsenseQA (CQA). They took question-answer pairs and generated two types of rationales using an LLM (specifically GPT-4o): PRO rationales, which argued for the answer’s plausibility, and CON rationales, which argued against it. They also created a PRO+CON setting where both arguments were presented.

Human Reactions to AI Explanations

The study collected 3,000 plausibility judgments from human annotators, who rated answers on a 1-5 Likert scale (1-Impossible, 5-Very Likely) under four conditions: no rationale, PRO rationale, CON rationale, and PRO+CON rationales. The findings revealed that human judgments were indeed significantly affected by the presence of these AI-generated explanations.

Generally, PRO rationales tended to increase human plausibility ratings, while CON rationales lowered them. However, a particularly intriguing observation emerged for ‘gold’ (correct) answer choices: when a PRO rationale was presented, human ratings surprisingly *dropped*. The researchers suggest this might be because for already highly plausible answers, a ‘plausibility argument’ might inadvertently ‘undersell’ the actual likelihood, making it seem less certain. Conversely, for ‘distractor’ (incorrect) answers, PRO rationales successfully raised ratings.

CON rationales, on the other hand, consistently lowered ratings for both gold and distractor answers, with a particularly strong impact on gold answers, sometimes causing a drop of over a full Likert scale point. When both PRO and CON rationales were presented, human ratings often settled somewhere between the effects of individual PRO and CON rationales, suggesting a balancing act.

LLMs Also Swayed by Rationales

To understand if AI models exhibit similar patterns, the researchers replicated the human experiment with 17 different LLMs, collecting an additional 13,600 judgments. These models were divided into OpenAI and Non-OpenAI groups. The results showed that LLMs were also highly sensitive to the rationales.

Similar to humans, PRO rationales generally increased LLM ratings, and CON rationales decreased them. However, a key difference from human behavior was observed: for gold answer choices, PRO rationales consistently *increased* LLM plausibility ratings, directly contrasting the human response. This highlights a divergence in how humans and LLMs process and react to supporting arguments for highly plausible statements.

OpenAI models, which included the model used to generate the rationales (GPT-4o), showed a higher sensitivity to these explanations, possibly due to a self-preference bias.

Why Do Ratings Change?

The study also investigated the factors contributing to these shifts. A strong ‘anchoring effect’ was identified: the initial plausibility rating of an answer had a significant impact on how much it would change. Higher initial ratings led to smaller subsequent changes, meaning it’s harder to shift an already strong opinion. This effect was even more pronounced for distractor answers.

Furthermore, CON rationales were found to have a stronger negative effect than PRO rationales had a positive effect, indicating that arguments against plausibility are often more potent than arguments for it. The length of the rationale, however, had only a weak relationship with rating changes.

Also Read:

Implications for Human-AI Interaction

The findings underscore the persuasive power of LLM-generated explanations. While this capability could be harnessed for positive human-AI collaboration—such as challenging users’ reasoning, stress-testing arguments, or introducing alternative perspectives—it also raises significant concerns. The ability of LLMs to shape opinions, even in common-sense domains, could potentially undermine human autonomy and informed decision-making. The authors emphasize the need for robust safeguards, including transparency, bias mitigation, and mechanisms to detect and counteract manipulative uses of AI.

The research also acknowledges limitations, such as its focus on English-language common-sense reasoning and a specific demographic of annotators, suggesting that cultural and linguistic differences could lead to varied impacts of rationales.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

When AI Explanations Shift Our Common Sense

Understanding Plausibility and Rationales

Human Reactions to AI Explanations

LLMs Also Swayed by Rationales

Why Do Ratings Change?

Implications for Human-AI Interaction

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates