Obfuscation Undermines Code Watermarking Efforts

TLDR: This research paper demonstrates that N-gram-based code watermarking schemes, used to identify AI-generated code, are not robust against code obfuscation. Through theoretical modeling and extensive experiments, the authors show that obfuscation, which alters code while preserving its functionality, can effectively nullify the detection capabilities of these watermarks, making them indistinguishable from unwatermarked code.

Large language models (LLMs) are increasingly being used to generate code, making it crucial to distinguish AI-generated code from human-written code. This distinction is important for tasks like identifying authorship, tracking content, and detecting misuse. To address this, N-gram-based watermarking schemes have emerged, which embed secret signals into the code during its generation for later detection.

However, the robustness of these watermarking schemes in code content has not been sufficiently evaluated. Many claims of robustness rely on defenses against simple code transformations or optimizations, which do not accurately simulate real-world attacks. In contrast, more sophisticated techniques like code obfuscation, which significantly alter code while preserving its functionality, have been largely unexplored in their impact on code watermarking.

This research focuses on the robustness of N-gram-based watermarking approaches for code. The authors formally model code obfuscation as a Markov random walk process to simulate an attack on watermarking schemes. They prove that N-gram-based watermarking cannot remain robust under a single, intuitive, and experimentally verified assumption: distribution consistency. This assumption suggests that the distribution of detectable N-gram features within an obfuscated code’s equivalent space remains similar to the distribution across all code.

The theoretical findings indicate that if the original false positive rate of watermarking detection is 𝜖pos, the ratio of watermarked code that the detector fails to identify after obfuscation will increase to 1 −𝜖pos. This effectively means the detection algorithm loses its ability to distinguish watermarked code from benign code.

To validate their theory, experiments were conducted on three state-of-the-art watermarking schemes (SWEET, WLLM, and SynthID), two large language models (LLaMA-3.1-8B-Instruct and DeepSeek-Coder-33B-Base), two programming languages (Python and JavaScript), four code benchmarks, and four different obfuscators (Python-Minifier, PyMinifier, JS Obfuscator, and UglifyJS). The results consistently showed that all watermarking detectors exhibited near coin-flipping detection abilities on obfuscated codes, with AUROC (Area Under the Receiver Operating Characteristic Curve) scores tightly clustering around 0.5. This means the detection performance degraded to random guessing after obfuscation, regardless of the model, watermarking scheme, or dataset used.

The study also confirmed the ‘distribution consistency’ assumption, finding it satisfied in 98.10% of cases during experiments. This strong empirical support reinforces the theoretical impossibility result. Even an ‘ideal’ watermarking scheme, designed with unrealistically high detection and quality-preserving capabilities, was shown to be vulnerable to obfuscation, with its AUROC dropping to near 0.5 after attack.

Also Read:

The paper acknowledges limitations, primarily that its scope is limited to N-gram-based watermarking schemes. However, given the widespread adoption and industrial deployment of these methods, the findings are highly relevant and timely. The authors suggest that future research should explore more semantically aware and transformation-resilient approaches to code watermarking to overcome these fundamental limitations. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Obfuscation Undermines Code Watermarking Efforts

Gen AI News and Updates

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

OpenAI Unveils ‘Friendlier’ GPT-5.1 for ChatGPT, Emphasizing Enhanced User Experience and Adaptive Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates