TLDR: Cloudflare, a leading internet security provider, has publicly accused AI search engine Perplexity of “stealth crawling” websites and bypassing established web security protocols like robots.txt. Cloudflare alleges that Perplexity’s bots disguise their identity and rotate IP addresses to circumvent blocks, accessing content on tens of thousands of domains. In response, Perplexity dismisses the claims as a “publicity stunt,” asserting its AI assistants operate as user-initiated tools that fetch information on demand, not as indiscriminate scrapers, and do not retain content for model training. This dispute highlights ongoing tensions between AI innovators and web infrastructure guardians regarding data access and web etiquette.
Cloudflare Alleges Perplexity Bypassed Web Protocols with Stealth Crawling, Igniting Debate on AI Web Etiquette
Internet infrastructure giant Cloudflare has leveled serious accusations against AI search engine Perplexity, alleging that the billion-dollar startup has engaged in “stealth crawling” to bypass web security protocols and access content on restricted websites. The dispute, which came to light on August 5, 2025, through a Cloudflare blog post, has reignited the broader debate surrounding AI’s interaction with web data and established internet norms.
According to Cloudflare, which protects an estimated 24 million websites, Perplexity’s crawlers exhibit deceptive behavior. Cloudflare’s investigation revealed that while Perplexity’s bots initially identify themselves, they subsequently “obscure their crawling identity in an attempt to circumvent the website’s preferences” when faced with a network block. This alleged circumvention involves modifying user agents, rotating IP addresses, and changing Autonomous System Numbers (ASNs) to evade detection. Cloudflare further claimed that Perplexity’s bots were “ignoring, or sometimes failing to even fetch robots.txt files,” a foundational web etiquette protocol in place since 1994 that dictates bot behavior and access permissions. This activity, Cloudflare stated, was observed “across tens of thousands of domains and millions of requests per day.” To substantiate its claims, Cloudflare conducted tests using newly created domains with AI bot restrictions, finding that Perplexity still managed to provide detailed information from these explicitly blocked sites. As a result of these findings, Cloudflare has de-listed Perplexity’s crawler as a verified bot and announced it would actively block Perplexity and its “stealth bots” from crawling websites. Cloudflare emphasized that “The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust.”
Perplexity, an AI-powered answer engine, has vehemently denied Cloudflare’s accusations. Jesse Dwyer, a spokesperson for Perplexity, dismissed Cloudflare’s report as a “sales pitch” and a “publicity stunt,” stating there were “a lot of misunderstandings in the blog post.” Perplexity argues that its AI assistants operate fundamentally differently from traditional web crawlers. The company maintains that its technology retrieves information only when users make specific requests, rather than indiscriminately scraping or indexing content for model training. Perplexity likens its on-demand fetching process to Google’s user-triggered functions, asserting that its system acts as an extension of the user, not an independent crawler subject to robots.txt restrictions in the same way. Perplexity also criticized Cloudflare for allegedly overblocking legitimate AI assistants and failing to differentiate between malicious scraping and genuine user-initiated traffic. However, Perplexity did not directly address Cloudflare’s specific technical findings regarding the use of disguised user agents and rotating IP infrastructure.
Also Read:
- Perplexity Bolsters AI Capabilities with Acquisition of Invisible for Agent Infrastructure
- AI-Powered Search Reshaping Digital Media Landscape, Threatening Publishers’ Revenue
This is not the first instance where Perplexity has faced accusations of unfair content scraping. In June, the BBC reportedly threatened legal action against the startup for allegedly scraping its content to train AI models. Similar complaints have also been lodged by Dow Jones and The New York Times. The ongoing dispute between Cloudflare and Perplexity underscores the growing complexities and ethical dilemmas at the intersection of artificial intelligence, web infrastructure, and content ownership, potentially reshaping future rules for data access and innovation on the internet.


