TLDR: ‘Studies for’ is a collaborative sound artwork by artist Evala and Sony researchers, utilizing a real-time multi-channel AI sound generation model called SpecMaskGIT. Trained on over 200 hours of Evala’s past works, the installation aims to create a “new form of archive” by continuously generating novel sounds in his artistic style. It addresses the challenges of preserving ephemeral sound art by integrating artist feedback, using artist-specific datasets, and ensuring unexpected, creative outputs, thereby extending an artist’s work beyond their physical existence.
In a groundbreaking collaboration between human creativity and artificial intelligence, a new sound artwork titled ‘Studies for’ has emerged, pushing the boundaries of generative art and offering a novel approach to archiving artistic legacies. Developed by researchers from Sony Group Corporation in partnership with renowned sound artist Evala, this installation utilizes a real-time multi-channel sound generation model to create an immersive auditory experience.
The artwork, ‘Studies for’, was exhibited at the NTT InterCommunication Center [ICC] in Tokyo from December 14, 2024, to March 9, 2025. It represents a significant step in integrating AI technologies into the artistic workflow, particularly in the challenging domain of sound art.
A New Form of Archive
At the heart of ‘Studies for’ is the concept of a “new form of archive.” Traditional archiving methods often struggle with the ephemeral and site-specific nature of sound and media art. Evala, known for his spatial sound works, expressed concerns that much of his art might not be reproducible after his lifetime. This project directly addresses that challenge by proposing that an AI model, trained on an artist’s past works, can preserve their artistic style while continuously generating new sound elements, effectively extending their creative output beyond their physical existence.
The AI model, SpecMaskGIT, a lightweight yet high-quality sound generation model, was trained on an extensive dataset of over 200 hours of Evala’s past sound artworks. This focused training allows the model to internalize and reproduce the distinctive characteristics of Evala’s style, ensuring that the generated output remains faithful to his artistic vision.
Human-AI Co-Creation Framework
The research paper highlights three crucial aspects for successful human-AI co-creation in art:
1. Artist Feedback Integration: The model was designed to allow for quick iterations of trial and error, enabling Evala to evaluate the generated sounds and provide feedback. This iterative process was essential for refining the model to accurately reflect his artistic identity.
2. Artist-Derived Datasets: Training the AI exclusively on Evala’s own body of work was key to preserving his unique style and ensuring the outputs were genuinely reflective of his artistic sensibility.
3. Inclusion of Unexpected Outputs: Artists often seek novel and surprising results from AI. To achieve this, ‘Studies for’ combined text prompts (titles of Evala’s past works) with audio inputs (his signature opening sound). This dual conditioning prevented the AI from merely creating collages of existing works, instead generating new, previously unheard sounds that still resonated with Evala’s style.
The Immersive Experience
The installation space itself was an integral part of the artwork. Enveloped in a white, curved fabric structure, it symbolized both the beginning of life and the continuation of sound beyond. Eight-channel speakers were strategically placed behind the fabric, allowing the audience to walk freely and experience the generative sound from various spatial perspectives. The SpecMaskGIT model generated sound across these eight channels in real-time, continuously for the entire three-month exhibition period, creating a unique and ever-evolving auditory environment.
To achieve this real-time, high-resolution (48kHz) eight-channel generation, significant technical optimizations were made to the SpecMaskGIT model, including reducing its complexity and replacing its vocoder with a faster alternative. The system was deployed on Linux workstations equipped with NVIDIA RTX 4080 GPUs, routing audio through professional interfaces to a spatial speaker array.
‘Studies for’ not only captivated over 20,000 visitors but also effectively conveyed its core message. Audiences, encountering the work at the end of the exhibition route, understood that the sounds were generated by an AI model that had learned from Evala’s previous creations, highlighting the profound relationship between the artist and the AI.
Also Read:
- The Unseen Influence of Phonation on AI Responses
- FELA: Advancing Feature Engineering with Collaborative AI Agents
Looking Ahead
This project offers a compelling vision for the future of art creation and preservation. By demonstrating how AI can be trained to embody an artist’s style and continuously generate new works, ‘Studies for’ proposes a meaningful framework for a posthumanist approach to art, where an artist’s creative output can expand and evolve indefinitely. For more details, you can read the full research paper here.


