TLDR: A research paper introduces an interactive system that allows users to control image generative models like StyleGAN2 and BigGAN by replacing their static activation functions with parametric ones. This enables real-time manipulation of image structure and style, fostering a better understanding of the models’ internal mechanisms through direct experimentation and visual feedback.
Generative AI models have become incredibly powerful, creating realistic images, audio, and text. However, understanding *how* these models achieve their impressive outputs often remains a mystery, even for experts. This gap in understanding is a key challenge in the field of Explainable AI (xAI). A new research paper by Ilia Pavlov introduces an innovative system that allows users to directly interact with the internal mechanisms of generative models, making the image generation process more transparent and controllable.
The paper, titled “Controlling the image generation process with parametric activation functions,” proposes an interactive tool designed to help both experts and non-experts better understand and manipulate generative networks. Instead of just observing the final output, users can now dive into the model’s structure and modify its behavior in real-time. This approach aims to increase AI literacy by providing a hands-on experimentation platform.
A New Way to Control Generative Models
At the heart of this system is the ability to replace standard, static activation functions within a generative network with “parametric” ones. Activation functions are crucial components in neural networks that determine whether a neuron should be activated or not, influencing the network’s output. By making these functions parametric, the system allows users to adjust specific parameters, thereby directly influencing how the network processes information and ultimately, the characteristics of the generated image.
The interactive tool features a graphical user interface (GUI) that simplifies this complex interaction. Users can select specific neural layers, choose from a variety of parametric activation functions (like Sinu-Sigmoidal Linear Unit (SinLU), Rectified Linear Unit N (ReLUN), and Shifted Rectified Linear Unit (ShiLU)), and then fine-tune their parameters. The GUI also provides immediate visual feedback, showing how each adjustment impacts the generated image in real-time. This direct feedback loop is essential for understanding the intricate relationship between internal network changes and external visual outcomes.
How It Works: Impact on Image Generation
The research demonstrates the effectiveness of this method on two prominent generative models: StyleGAN2 and BigGAN. StyleGAN2, known for its high-fidelity image generation, was tested by applying parametric activation functions to both its mapping network and generator network.
When applied to the mapping network of StyleGAN2, the parametric functions primarily affected the overall structure of the generated images. Changes in earlier layers of the mapping network offered a high degree of control over subtle image details. In contrast, applying these functions to the generator network influenced both the style and, in earlier layers, the structure of the images. Later layers of the generator network tended to alter more basic aspects, such as the overall coloration. The paper shows that even slight parameter adjustments can lead to noticeable changes, while larger modifications can result in more abstract or unpredictable imagery, opening possibilities beyond realistic image generation.
For BigGAN, a classic Generative Adversarial Network, the system also showed its ability to alter image content, though the variations were less dramatic compared to StyleGAN2. The paper also explored polynomial activation functions, noting their sensitivity and limited effectiveness when used in multiple layers.
Also Read:
- Unlocking Trust: The Power of Interactive AI Explanations
- AI Designs and Solves Game Levels in a Dynamic Unity Environment
Empowering User Exploration
This method doesn’t aim for precise, guided image generation of specific features. Instead, it promotes “incidental exploration” through a process of trial-and-error. By continuously experimenting with different parametric functions and their settings, users can gradually build an intuitive understanding of how the network’s internal state influences its creative output. This hands-on learning experience is a significant step towards demystifying complex AI models and fostering greater AI literacy.
While the current implementation offers unguided control, which can be imprecise, the potential for deeper understanding and creative exploration is substantial. The research suggests that future applications, perhaps combined with text-to-image models, could further enhance the user experience. You can explore the full details of this fascinating research paper here: Controlling the image generation process with parametric activation functions.


