TLDR: A research paper introduces a ComfyUI plugin that allows artists to directly manipulate the internal components of large text-to-image diffusion models. This “model bending” approach, inspired by craft-based practices, aims to provide artists with a deeper, intuitive understanding of how AI models generate images, fostering greater agency and creative control beyond simple prompting.
The paper introduces a novel approach to Explainable AI (XAI) within creative fields, moving beyond traditional transparency to support artistic engagement and sustained practice. It argues that even large generative models, like text-to-image diffusion systems, can be treated as “creative materials” if their internal structures are exposed and made manipulable. This concept is demonstrated through a model-bending and inspection plugin integrated into ComfyUI, a node-based interface.
Traditionally, XAI focuses on demystifying machine learning for auditing or safety. However, in art, explainability can make models modifiable and debuggable, fostering meaningful artistic engagement. The authors suggest that while working with smaller datasets and human-scale models can give artists more control, large-scale models often limit this agency. Their solution is to expose the internal components of these large models, allowing artists to interact with them directly.
The core idea is to foster a “craft-based” relationship between artists and generative systems, similar to Donald Schön’s “reflection-in-action.” Through hands-on manipulation of a model’s components, artists can develop an intuitive, tacit understanding of how each part influences the output. This contrasts with the commodification of AI models, where users are incentivized to sample from vast catalogs without developing deep familiarity.
The paper highlights ComfyUI as an ideal platform for this approach. Unlike other interfaces that abstract away internal workings, ComfyUI’s modular, node-based design decomposes the diffusion process into discrete, accessible nodes. This allows users to explore, customize, and understand each component.
The researchers implemented a model bending system as a suite of custom nodes within ComfyUI. This system aims to introduce variations and diversity into the text-to-image diffusion process while simultaneously helping artists understand the generative process. Bending operations can be applied to various components of the latent diffusion pipeline, including the Variational Autoencoder (VAE), CLIP text embeddings, and the UNet (noise prediction model).
Also Read:
- ThematicPlane: A New Approach to Intuitive Image Editing with AI
- Precisely Erasing Concepts from AI Image Generators with UnGuide
Key Features of the Plugin:
UNet Model Bending: The UNet is central to image generation in diffusion models. The plugin allows users to specify a bending operator and a path to a specific layer within the UNet. Once inserted, this bending module influences the denoising process, enabling artists to manipulate how noise is cleaned and images are formed.
Model Inspection: For more precise control, a Model Inspector node is provided. This tool displays the model’s architecture as an expandable tree, allowing artists to visually navigate and select any layer for manipulation. This interactive process helps artists build intuition about how different components affect the final output.
Feature Maps Visualization: The Visualize Feature Map node allows users to see intermediate feature maps at any layer. This helps artists understand what the model is “attending to” at various stages of denoising and how their modifications impact these internal representations.
CLIP Text Encoding (Conditionings) Bending: The plugin also enables fine-grained adjustments within the text embedding space generated by the CLIP model. Small movements in this space can provide artists with subtle controls that complement traditional prompt-based interactions.
The authors conclude that by reframing explainability as an artistic practice rooted in making and manipulation, their tool fosters a deeper, more personal relationship between artists and AI models. They hope this approach will help address challenges like authorship and agency in AI-assisted art. For more details, you can read the full research paper here: Explainability-in-Action: Enabling Expressive Manipulation and Tacit Understanding by Bending Diffusion Models in ComfyUI.


