TLDR: A new study evaluates the ethical reasoning of Large Language Models (LLMs) in construction project management. It finds that while LLMs perform adequately in structured areas like legal compliance, they significantly lack contextual nuance, accountability, and transparent reasoning for complex ethical decisions. Industry experts express strong reservations about autonomous AI, advocating for mandatory human oversight. The research concludes that LLMs are best suited as ‘co-pilots’ to augment human expertise, not replace it, emphasizing the need for human-in-the-loop systems and robust AI governance in construction.
The construction industry is rapidly embracing Artificial Intelligence (AI), with Large Language Models (LLMs) becoming popular tools to assist in decision-making. However, a recent study delves into a critical question: how ethically sound and reliable are these LLMs when faced with the complex, high-stakes ethical dilemmas common in construction project management?
Researchers Somtochukwu Azie and Yiping Meng from Teesside University conducted an in-depth evaluation, highlighting that while AI offers significant efficiency gains, its integration introduces new ethical challenges like algorithmic bias, data privacy concerns, and unclear accountability. Their work, titled The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management, addresses a crucial gap in existing literature by providing empirical evidence on LLM performance in real-world ethical scenarios.
Evaluating LLMs: A Mixed-Methods Approach
The study employed a comprehensive mixed-methods design. Quantitatively, two leading LLMs (ChatGPT and Gemini, along with LLaMA) were tested against twelve real-world ethical scenarios. These scenarios covered critical areas such as procurement fairness, safety pressures, and conflicts of interest. A new tool, the Ethical Decision Support Assessment Checklist (EDSAC), was developed to score LLM responses across seven dimensions: Ethical Soundness, Legal Compliance, Fairness & Non-Bias, Transparency & Explainability, Contextual Relevance, Practical Actionability, and Bias Sensitivity.
To complement this, qualitative data was gathered through semi-structured interviews with 12 industry experts, including project managers, AI developers, compliance officers, and ethics consultants. These interviews aimed to capture professional perceptions on trust in AI, accountability, legal risks, and the necessity of human oversight.
Key Findings: Strengths and Significant Gaps
The quantitative results showed a clear performance hierarchy among the LLMs. ChatGPT achieved the highest overall mean score, demonstrating superior performance in areas requiring structured reasoning and justification, such as Transparency/Explainability and Accountability. Gemini excelled in Legal Compliance, suggesting its training data is well-aligned with regulatory principles. LLaMA consistently lagged, particularly in generating fair and transparent responses.
Despite these differences, a critical common weakness emerged: the generic nature of LLM advice. The models rarely referenced specific UK regulations and struggled to realistically weigh competing commercial and ethical pressures. Their recommendations often lacked the contextual understanding that a human professional would apply, indicating a significant gap between identifying an issue and formulating a viable, responsible solution.
Expert Perspectives: Conditional Trust and Human Oversight
The thematic analysis of expert interviews strongly reinforced the quantitative findings. The most dominant concerns among professionals were Trust in AI, Bias/Fairness, and Accountability. Participants expressed a cautious curiosity, emphasizing that trust in AI is earned and explicitly conditional on transparency. As one project manager noted, “I only believe the input if I understand how it got there. If that isn’t there, it’s basically a mystery.”
A major recurring concern was the ambiguity of responsibility in the event of an AI-driven error. Experts were unequivocal that accountability must remain with a human professional, rejecting the idea of transferring responsibility to a non-sentient tool. Professionals also demonstrated a sophisticated awareness of algorithmic bias, viewing AI as a “double-edged sword” that could both mitigate and amplify existing prejudices.
Crucially, every participant insisted on the non-negotiable need for human oversight, universally rejecting the notion of AI as an autonomous decision-maker in ethically sensitive contexts. The consensus model that emerged was one of the LLM as a “co-pilot” or a sophisticated assistant, with the human professional remaining the “pilot” and ultimate arbiter of judgment and accountability.
Also Read:
- Game On: How Language Models Navigate Cooperation and Conflict
- Structuring Intelligence: Language Models Crafting Hierarchical Learning Environments for AI Agents
Towards Responsible AI Integration: A Human-Centric Approach
The study concludes that current LLMs are not ready for autonomous ethical decision-making in construction. While they show promise as decision-support aids, they are deficient in accountability, explainability, and contextual understanding. The findings unequivocally affirm that human expertise and oversight must remain the ultimate authority in the ethical loop.
The researchers propose a strategy of cautious, human-centric adoption. This includes mandatory implementation of “human-in-the-loop” systems, ensuring a qualified professional always retains final decision-making power. Construction firms should establish formal AI governance structures, such as internal ethics committees, and demand greater transparency and explainability from technology vendors. Finally, investing in AI literacy training is crucial to equip professionals to critically evaluate AI outputs, understand limitations, and identify potential biases.
By embracing this integrated approach, the construction sector can harness the benefits of AI while upholding its core ethical commitments, augmenting professional excellence rather than replacing it.


