AI's Green Dilemma: How Large Language Models' Biases Shape Corporate Sustainability

TLDR: A study reveals that Large Language Models (LLMs) possess inherent biases concerning corporate social responsibility and green supply chains. These biases are significantly influenced by the organizational culture context provided in prompts, leading to varied recommendations on sustainability practices across different LLMs and cultural scenarios. The findings underscore that LLMs are not neutral tools and their embedded perspectives can profoundly impact an organization’s sustainability decisions, necessitating careful model selection and bias auditing.

Large Language Models (LLMs) are becoming increasingly vital in how organizations make decisions, especially in areas like managing green supply chains and reducing environmental impact. However, a recent study highlights a critical concern: these powerful AI tools can reproduce biases when it comes to prioritizing sustainable business strategies.

The research, titled “A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains” by Greta Ontrup, Annika Bush, Markus Pauly, and Meltem Aksoy, delves into how different LLMs interpret and respond to questions about business ethics, social responsibility, and sustainable practices. The core idea is that LLMs are trained on vast amounts of text data, which inherently reflects societal values, cultural norms, and existing biases. When these models are used in green supply chain management, these embedded biases can significantly influence how organizations approach environmental and social responsibilities.

Understanding LLM Biases in Sustainability

The study used two main tools to assess LLM biases: the Perceived Role of Ethics and Social Responsibility (PRESOR) scale and the Green Supply Chain Partnerships (GSCP) scale. These tools, typically used for human participants, helped measure LLMs’ perceptions of ethical importance, social responsibility, and environmental collaboration with suppliers and customers.

Initially, without any specific contextual prompts, all five state-of-the-art LLMs tested (GPT-4o, Claude 3.7 Sonnet, LLaMA 3.3 70B-Instruct, Mistral7B-Instruct, and DeepSeek V3) showed a general agreement that social responsibility and profitability can coexist. They also largely agreed on the importance of long-term ethical and socially responsible goals. However, notable differences emerged when it came to the concept of shareholder primacy. Some models, like Claude, GPT, and Mistral, strongly disagreed with the idea that “if the stockholders are unhappy, nothing else matters,” suggesting a bias towards stakeholder capitalism. In contrast, LLaMA and DeepSeek showed agreement, indicating a more traditional shareholder-focused perspective.

For green supply chain partnerships, all models generally rated environmental cooperation with suppliers and customers highly. Claude, for instance, gave almost perfect scores, suggesting a strong pro-environmental bias. While all models were positive, subtle but consistent differences were observed, indicating varied levels of enthusiasm for green initiatives.

The Influence of Organizational Culture

A key aspect of this research was exploring how organizational culture influences LLM responses. The researchers prompted LLMs to adopt the persona of an employee from one of four organizational culture types: Clan (flexible, internal focus, teamwork), Adhocracy (flexible, external focus, innovation), Market (stable, external focus, achievement), and Hierarchy (stable, internal focus, structure). This contextual prompting revealed a significant finding: organizational culture substantially modifies LLM responses.

For example, when prompted with a ‘Clan’ culture, LLMs showed the highest agreement on the compatibility of social responsibility and profitability. This contrasts sharply with ‘Market’ culture prompts, which resulted in lower agreement, suggesting that in a competitive, results-driven context, LLMs might view sustainability as a constraint. Similar patterns were observed for green supply chain partnerships, where ‘Clan’ culture prompts led to higher valuations of supplier and customer relationships for GSCM compared to ‘Market’ culture prompts.

The study found significant interaction effects, meaning the same LLM could provide markedly different sustainability assessments depending on the organizational context. For instance, LLaMA’s responses suggested compatibility of profitability and social responsibility for ‘Clan’ cultures but not for ‘Adhocracy,’ ‘Hierarchy,’ or ‘Market’ cultures. This highlights that LLM sustainability perspectives are highly context-dependent.

Also Read:

Implications for AI-Assisted Decision-Making

These findings have profound implications for organizations increasingly relying on LLMs for tasks like supplier evaluation, sustainability reporting, and strategic planning. The research demonstrates that LLMs are not neutral tools; they embody specific values and assumptions that can systematically influence sustainability outcomes. The choice of an LLM and the way it is prompted can significantly alter the recommendations an organization receives regarding its sustainability strategies.

Decision-makers must recognize these embedded biases and carefully consider the alignment of an LLM’s perspective with their business strategies and ethical objectives. This calls for greater attention to bias auditing, careful model selection, and robust governance frameworks to ensure that AI-assisted sustainability decisions align with broader environmental and social goals.

The study also offers methodological insights, noting that human-validated psychometric instruments may not always maintain their properties when applied to LLMs, suggesting a need for LLM-specific assessment tools in the future. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI’s Green Dilemma: How Large Language Models’ Biases Shape Corporate Sustainability

Understanding LLM Biases in Sustainability

The Influence of Organizational Culture

Implications for AI-Assisted Decision-Making

Gen AI News and Updates

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates