Verifying AI Content: A Robust Watermark for Open-Source LLMs

TLDR: The research paper introduces PRO, a novel text watermarking method for open-source Large Language Models (LLMs). It tackles the challenges of embedding watermarks directly into model weights without compromising detection accuracy or robustness against user modifications like fine-tuning or model merging. PRO achieves this through a Co-Adaptive Watermark Policy (CAWP) that generates learnable watermark patterns and Forgotten Perturbation-aware Learning (FPL) which makes the watermark resilient to model alterations. Evaluations on mainstream open-source LLMs demonstrate PRO’s superior detectability and robustness while preserving text generation quality.

As large language models (LLMs) become more powerful and widely used, verifying the origin of AI-generated text is increasingly important. This helps model owners protect their intellectual property and combat the misuse of AI. While watermarking methods for closed-source LLMs are quite advanced, applying these techniques to open-source LLMs presents unique challenges.

Closed-source models typically embed watermarks during the text generation process itself. However, this approach doesn’t work for open-source models because developers don’t control how users decode or generate text. This leaves owners of open-source LLMs without a reliable way to confirm if a piece of AI-generated text came from their models.

The main hurdle is embedding watermarks directly into the model’s internal workings (its weights) without negatively impacting the accuracy of detection. Previous attempts to transfer watermarks from closed-source settings to open-source models faced two critical issues: first, reduced detectability because the watermark patterns were hard for models to learn effectively, leading to inconsistencies between what was learned and what was expected during detection. Second, these watermarks were vulnerable to modifications by users, such as fine-tuning the model or merging it with other models, which could weaken or completely remove the embedded watermark.

Introducing PRO: A Solution for Open-Source LLM Watermarking

To tackle these problems, researchers have introduced a new method called PRO, designed to be a precise and robust text watermarking solution for open-source LLMs. PRO addresses the core challenges through two main innovations: the Co-Adaptive Watermark Policy (CAWP) and Forgotten Perturbation-aware Learning (FPL).

Co-Adaptive Watermark Policy (CAWP)

Unlike older methods that used rigid, predefined watermark patterns, PRO introduces a trainable watermark policy model. This policy is optimized alongside the LLM during its training. This co-optimization ensures that the watermark patterns generated are easier for the LLM to learn. By adapting the watermark pattern to the model’s learning dynamics, PRO significantly reduces the inconsistencies between the patterns the model generates and the criteria used for detection. Crucially, during detection, PRO uses this optimized policy, ensuring alignment with the watermark signals the LLM has actually internalized.

Forgotten Perturbation-aware Learning (FPL)

To enhance robustness against user modifications, PRO incorporates Forgotten Perturbation-aware Learning. User modifications, like fine-tuning or model merging, can inadvertently or intentionally erase learned watermarks. FPL addresses this by simulating various perturbations (changes to the model’s weights) that would maximally degrade watermark detectability. During training, the LLM learns to withstand these ‘forgotten perturbations’ by minimizing their disruptive effects while maintaining the watermark’s detectability. This approach ensures that the embedded watermark remains resilient even after downstream model alterations.

Also Read:

Demonstrated Effectiveness

PRO has been evaluated on popular open-source LLMs, including LLaMA-3.2, LLaMA-3, and Phi-2. The results show that PRO significantly outperforms previous methods in both watermark detectability and its ability to withstand model modifications. It achieves high detectability with minimal impact on the quality of the generated text. For instance, PRO maintains high detectability (an AUC score of 0.80 or higher) even under aggressive modifications like high-ratio model merging and extensive fine-tuning. This marks a significant step forward in providing a practical and reliable solution for watermarking open-source LLM text.

The computational cost of PRO is also comparable to prior methods. The watermark policy model is lightweight, and while FPL adds some processing, it also leads to faster convergence during training, balancing out the overall time. For more technical details, you can refer to the full research paper: PRO: ENABLING PRECISE AND ROBUST TEXT WATERMARK FOR OPEN-SOURCE LLMS.

In conclusion, PRO offers a robust and precise framework that addresses the critical challenges of watermarking open-source LLMs. By making watermark patterns learnable and resilient to modifications, it provides a practical way for owners to verify the origin of AI-generated content while maintaining text quality.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Verifying AI Content: A Robust Watermark for Open-Source LLMs

Introducing PRO: A Solution for Open-Source LLM Watermarking

Co-Adaptive Watermark Policy (CAWP)

Forgotten Perturbation-aware Learning (FPL)

Demonstrated Effectiveness

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates