spot_img
HomeResearch & DevelopmentOptimizing Stable Diffusion for Fairer Images and Lower Energy...

Optimizing Stable Diffusion for Fairer Images and Lower Energy Consumption

TLDR: SustainDiffusion is a search-based approach that optimizes Stable Diffusion models to significantly reduce gender and ethnic bias (by 68% and 59% respectively) and energy consumption (by 48%) while maintaining image quality. It achieves this by finding optimal combinations of hyperparameters and prompt structures, demonstrating that social and environmental sustainability can be improved without altering the model’s architecture or fine-tuning.

Text-to-image generation models, like the widely popular Stable Diffusion (SD), have become indispensable tools across various fields, from advertising to education. With over 12 billion images generated annually by Stable Diffusion alone, its extensive use brings to light significant concerns regarding its social and environmental impact.

The social aspect of sustainability in software relates to mitigating harm caused by discrimination and bias. Stable Diffusion models, for instance, have been shown to exhibit notable gender and ethnic biases, particularly when generating images for specific professions like software engineers. This can perpetuate existing stereotypes within the tech community. Environmentally, these models are energy-intensive, with a single image generation potentially consuming as much energy as charging a phone up to 40%.

Addressing these sustainability challenges is complex because improving one dimension often negatively impacts another. For example, compressing models to save energy might worsen bias, while fine-tuning for fairness could increase energy consumption. Furthermore, not all users have the resources to perform such extensive modifications. This highlights the need for a multi-objective approach that can balance these conflicting goals.

Introducing SustainDiffusion, a novel search-based approach designed to enhance the social and environmental sustainability of Stable Diffusion models without altering their core architecture or requiring extensive fine-tuning. This innovative method frames the problem as a search task, seeking the optimal combination of model settings and prompt structures to achieve a better balance of fairness, energy efficiency, and image quality.

SustainDiffusion operates by exploring a vast space of possible solutions, represented by different Stable Diffusion hyperparameters (like guidance scale and inference steps) and prompt engineering techniques (such as positive and negative keywords with assigned weights). For each potential solution, the system generates images and evaluates them based on three key fitness functions: image quality, gender bias, ethnic bias, and CPU energy consumption (which serves as a reliable proxy for overall energy usage, including GPU and time). The NSGA2 algorithm, a popular multi-objective evolutionary search method, guides this exploration, progressively evolving a population of solutions towards optimal trade-offs.

The empirical evaluation of SustainDiffusion demonstrated remarkable results. It successfully reduced gender bias in Stable Diffusion 3 (SD3) by an impressive 68% and ethnic bias by 59%. Furthermore, it achieved a significant reduction in energy consumption—calculated as the sum of CPU and GPU energy—by 48%. Importantly, these improvements were achieved while maintaining image quality comparable to that of the original SD model. The outcomes were also found to be consistent across multiple runs and generalizable to various prompts, indicating its robustness and practical applicability.

The findings underscore several critical insights. Firstly, prompt engineering, in addition to hyperparameter tuning, is essential for effectively mitigating gender bias in generated images. Explicitly asking the model for fair representation alone is often insufficient. Secondly, the study revealed that bias is largely unrelated to image quality or energy consumption, meaning improvements in fairness can be made without compromising other objectives. Thirdly, search-based hyperparameter tuning proved highly effective in reducing energy consumption. Optimizing for CPU energy positively impacts GPU energy and generation time, and crucially, reducing energy consumption does not negatively affect image quality or increase bias.

While a full run of SustainDiffusion takes about 20 hours, the optimized Stable Diffusion model can generate an image in approximately 25 seconds. This means the initial investment in running SustainDiffusion is quickly recouped, especially considering the vast number of images generated daily by Stable Diffusion users. The consistency and generalizability of the optimal configurations mean that SustainDiffusion can be run once, and its findings can be applied effectively to a wide range of use cases and prompts.

Also Read:

This research, detailed further in the SustainDiffusion paper, offers a promising pathway towards more responsible and sustainable AI development, demonstrating that it is possible to enhance the social and environmental aspects of text-to-image generation models without complex fine-tuning or architectural changes.

Rhea Bhattacharya
Rhea Bhattacharyahttps://blogs.edgentiq.com
Rhea Bhattacharya is an AI correspondent with a keen eye for cultural, social, and ethical trends in Generative AI. With a background in sociology and digital ethics, she delivers high-context stories that explore the intersection of AI with everyday lives, governance, and global equity. Her news coverage is analytical, human-centric, and always ahead of the curve. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -