TLDR: GA4GC is a novel framework that addresses the sustainability and scalability challenges of LLM-powered coding agents. It uses multi-objective optimization (NSGA-II) to find Pareto-optimal configurations for agent hyperparameters and prompt templates, balancing agent runtime and generated code performance. The framework achieved up to a 135x hypervolume improvement, reducing agent runtime by 37.7% while enhancing correctness. The study identified temperature as the most critical hyperparameter and provides actionable strategies for balancing efficiency and effectiveness in industrial deployments.
Large Language Model (LLM)-powered coding agents are becoming increasingly powerful tools in software development, capable of automating complex tasks like code optimization. However, their industrial deployment faces significant challenges related to sustainability and scalability. A single run of these agents can consume over 100,000 tokens, leading to substantial computational resources and environmental costs that can sometimes outweigh the benefits of the optimization they perform.
To address this critical issue, researchers have introduced a groundbreaking framework called GA4GC: Greener Agent for Greener Code. This innovative approach systematically optimizes the trade-offs between a coding agent’s runtime (making it a “greener agent”) and the performance of the code it generates (resulting in “greener code”). The core idea behind GA4GC is to discover the ideal, or Pareto-optimal, configurations for agent hyperparameters and prompt templates.
The GA4GC framework employs a multi-objective optimization technique called NSGA-II. This method explores a vast configuration space that includes LLM-specific settings (like temperature, top_p, and maximum tokens), agent-specific operational constraints (such as step limits, cost limits, and timeouts), and different prompt template variants. The goal is to simultaneously improve three key objectives: code correctness, code performance gain (speedup), and minimizing the agent’s runtime.
Evaluation on the SWE-Perf benchmark, which features real-world code optimization tasks, demonstrated remarkable improvements. GA4GC achieved up to a 135 times improvement in hypervolume, a metric that indicates the overall quality of the trade-offs found. More concretely, the framework reduced agent runtime by an impressive 37.7% (from 1513.3 seconds to 943.1 seconds) while simultaneously improving the correctness of the generated code. This means the agents can operate much faster and more efficiently without sacrificing the quality of their output.
A crucial part of the research involved analyzing how different hyperparameters influence the agent’s performance and resource consumption. The findings established that ‘temperature’ is the most critical hyperparameter. Temperature controls the randomness in token selection during the LLM’s generation process. Moderate temperatures (around 0.66-0.69) were found to be effective for achieving high code performance, while lower temperatures (0.0-0.1) led to faster runtime but less performance gain. Other hyperparameters like ‘top_p’ (which limits the sampled token vocabulary size) and ‘cost_limit’ also play significant roles in balancing correctness, performance, and runtime.
Based on these insights, GA4GC provides actionable strategies for practitioners in Green Software Engineering. For scenarios where minimizing runtime is paramount, the framework suggests using low temperature settings with restrictive top_p values and moderate limits on tokens and steps. Conversely, for performance-critical scenarios, moderate temperatures with balanced top_p values, higher cost budgets, and specific prompt templates are recommended to enable more creative optimization strategies. For those with unique requirements, GA4GC can be applied directly to discover tailored Pareto-optimal configurations.
Also Read:
- AutoMaAS: A Self-Evolving Framework for Multi-Agent AI Systems
- Enhancing Automated Program Repair with Intelligent Filtering Policies
This research marks a significant step towards making AI coding agents more sustainable and scalable for industrial deployment. By systematically optimizing their configurations, GA4GC helps balance the need for efficient code generation with the imperative of reducing computational and environmental costs. For more in-depth information, you can refer to the full research paper available here.


