TLDR: ActiveMark is a novel watermarking method for Visual Foundation Models (VFMs) that embeds digital watermarks into the models’ internal representations by leveraging ‘massive activations’ in specific layers. This technique allows owners to verify their intellectual property even after the model has been fine-tuned or pruned, demonstrating high detection rates for watermarked models and low false positives for independent models, all while being computationally efficient.
Visual Foundation Models (VFMs) are powerful AI systems trained on vast datasets, capable of adapting to many computer vision tasks like image classification and segmentation. Their development requires significant investment in data collection and training, making them valuable assets for their owners. However, the ease with which these models can be copied and redistributed illegally poses a significant challenge to protecting intellectual property rights.
To address this, researchers are developing methods to verify ownership. One prominent approach is watermarking, where specific information is embedded into a model by modifying its internal parameters. This embedded information can then be checked to confirm ownership. Another method, fingerprinting, generates a unique identifier for a model without altering it, and ownership is verified by comparing fingerprints.
A new method called ActiveMark has been introduced, specifically designed for watermarking visual foundation models. ActiveMark embeds digital watermarks into the hidden representations of a select set of input images. This approach leverages a concept known as “massive activations,” which are unusually high response values observed in specific layers or tokens within a VFM. These massive activations often dominate subsequent layers and are found to be ideal locations for embedding watermarks due to their significant impact on the model’s internal representations.
The process involves fine-tuning a small number of the VFM’s later layers, along with training lightweight encoder and decoder networks. The encoder injects a user-specific binary signature (watermark) into a chosen channel of the internal activation of a preselected transformer block. This modified representation then passes through the rest of the VFM. A decoder network at the final block extracts the binary message, allowing for ownership verification.
The training objective for ActiveMark is twofold: it ensures that the watermarked model’s internal representations remain very similar to the original model’s, and it forces the extracted watermark to be nearly identical to the embedded one. This balance ensures that the watermark is successfully embedded and extractable with minimal impact on the model’s functional performance.
To evaluate its effectiveness, ActiveMark measures the “watermark detection rate,” which indicates how reliably the embedded watermark can be extracted. A good watermarking method should have a high detection rate for copies of the watermarked model and a very low detection rate for independent, non-watermarked models. The researchers also developed a statistical method to set a threshold for detection, minimizing the chances of falsely identifying a non-watermarked model or failing to detect a watermarked one.
Experiments showed that early transformer blocks are not suitable for embedding due to low detection rates and high errors. In contrast, specific later blocks, particularly block 12 in models like CLIP, demonstrated high detection rates and low errors. This block also happens to be one of the first layers where massive activations emerge, supporting the hypothesis that these regions are effective for watermark embedding.
ActiveMark was tested for robustness against common model modifications, such as fine-tuning for downstream tasks (like image classification and segmentation) and pruning (reducing model size). The results indicate that the watermarks remain detectable even after these significant alterations. For example, fine-tuning a watermarked CLIP model for semantic segmentation still yielded high detection rates.
When compared to other general-purpose watermarking techniques like ADV-TRA and IPGuard, ActiveMark demonstrated superior watermark detection rates for both positive (functional copies) and negative (independent) suspect models. Furthermore, ActiveMark significantly reduced the computational time required for watermark embedding, taking only 34.63 minutes compared to 1663.70 minutes for ADV-TRA and 1868.54 minutes for IPGuard on a single GPU.
Also Read:
- CATMARK: A New Approach to Watermarking LLM Content Across Diverse Tasks
- DMark: A Novel Watermarking Framework for Diffusion Large Language Models
In conclusion, ActiveMark offers a novel, robust, and efficient solution for watermarking visual foundation models. It is designed to be model-agnostic, meaning the owner only needs to perform the embedding procedure once. The watermarked model remains detectable even after fine-tuning for various tasks, and the method effectively distinguishes between legitimate copies and independent models, making it highly applicable in practical scenarios. For more technical details, you can refer to the full research paper here.


