Assessing the Energy Footprint of AI-Generated Code: Human Expertise Still Leads the Way

TLDR: A study compared Python code generated by six Large Language Models (LLMs) with code written by human developers and a Green software expert across server, PC, and Raspberry Pi platforms. Findings show that human-written code, especially by a Green software expert, is generally more energy-efficient (17-30% better) than LLM-generated code, though LLMs sometimes outperformed humans on PCs. Prompting techniques had limited and inconsistent impact on energy savings. The research highlights the critical need for human expertise in developing energy-efficient code and urges LLM vendors to prioritize energy efficiency as a core metric.

A recent study delves into a crucial question for the modern software development landscape: how energy-efficient is the code generated by Large Language Models (LLMs) compared to human-written code? With LLMs becoming increasingly integrated into development workflows, understanding their environmental impact, particularly concerning energy consumption, is more important than ever.

The research, titled “Generating Energy-Efficient Code via Large-Language Models – Where are we now?”, was conducted by a team of researchers including Radu Apsan, Vincenzo Stoico, Michel Albonico, Rudra Dhar, Karthik Vaidhyanathan, and Ivano Malavolta. Their work provides an empirical assessment of Python code generated by six widespread LLMs against code written by human developers and, notably, by a Green software expert.

Understanding the Study’s Approach

To evaluate energy efficiency, the researchers tested 363 solutions for 9 coding problems sourced from the EvoEval benchmark. They utilized six popular LLMs: GPT4, ChatGPT, DeepSeek Coder 33B, Speechless Codellama 34B, Code Millenials 34B, and WizardCoder 33B. These LLMs were engaged using four different prompting techniques to see if specific instructions could influence the energy efficiency of the generated code. The energy consumption measurements were meticulously taken on three distinct hardware platforms: a server, a personal computer (PC), and a Raspberry Pi, accumulating approximately 881 hours of total measurement time.

Key Findings on Energy Efficiency

The study yielded several significant insights. When comparing LLM-generated code to code written by average human developers, the results varied by hardware. Human solutions were found to be 16% more energy-efficient on the server and 3% more efficient on the Raspberry Pi. Interestingly, LLMs outperformed human developers by 25% on the PC. This highlights that the energy efficiency of code is highly dependent on the execution environment.

One of the most striking findings concerns the role of Green software experts. Code developed by an expert in Green software was consistently more energy-efficient, by at least 17% to 30%, across all LLMs and all hardware platforms. This suggests that while LLMs are capable code generators, they currently lack the nuanced understanding and expertise required to consistently produce highly energy-efficient solutions.

The impact of prompting techniques was also explored. The study found that prompting did not consistently lead to energy savings. The most energy-efficient prompts varied by hardware platform, indicating that a one-size-fits-all prompting strategy for energy efficiency is not effective at present. In some cases, prompting even led to less energy-efficient solutions, and it introduced higher variability in energy usage, especially with guideline and few-shot prompts on the server.

Also Read:

Implications for Developers and LLM Vendors

For developers, the research underscores the importance of maintaining a critical attitude towards LLM-generated code. The energy efficiency is context-dependent, varying significantly with the hardware platform. Developers are encouraged to review and refine LLM-generated code, potentially using established Green Python guidelines, to enhance efficiency. The study also suggests that prompt engineering, while useful for other aspects, currently has limited and inconsistent impact on energy efficiency.

LLM vendors are urged to consider energy efficiency as a primary metric in their models. The current gap between LLM-generated code and expert-developed Green code presents both an environmental challenge and an economic opportunity. Investing in techniques like fine-tuning, Retrieval Augmented Generation (RAG), or specialized models for green code generation could significantly improve the sustainability of software development. The research also calls for the creation of a dedicated Green code base for benchmarking LLMs’ capabilities in generating energy-efficient software, similar to existing benchmarks for functional correctness.

This comprehensive study provides valuable insights into the current state of LLM-generated code regarding energy efficiency. It emphasizes the continued need for human expertise in developing truly sustainable software and points towards critical areas for future research and development in the field of Green AI. You can read the full research paper for more details here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Assessing the Energy Footprint of AI-Generated Code: Human Expertise Still Leads the Way

Understanding the Study’s Approach

Key Findings on Energy Efficiency

Implications for Developers and LLM Vendors

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Peking University Researchers Unveil Analog Chip Boosting AI Data Centers by Up to 1,000-Fold

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates