When Cloudflare accused AI search engine Perplexity of stealthily scraping web sites on Monday, whereas ignoring a web site’s particular strategies to dam it, this wasn’t a clear-cut case of an AI net crawler gone wild.
Many individuals got here to Perplexity’s protection. They argued that Perplexity accessing websites in defiance of the web site proprietor’s needs, whereas controversial, is suitable. And it is a controversy that may definitely develop as AI brokers flood the web: Ought to an agent accessing a web site on behalf of its consumer be handled like a bot? Or like a human making the identical request?
Cloudflare is thought for offering anti-bot crawling and different net safety providers to hundreds of thousands of internet sites. Primarily, Cloudflare’s check case concerned establishing a brand new web site with a brand new area that had by no means been crawled by any bot, establishing a robots.txt file that particularly blocked Perplexity’s recognized AI crawling bots, after which asking Perplexity in regards to the web site’s content material. And Perplexity answered the query.
Cloudflare researchers discovered the AI search engine used “a generic browser supposed to impersonate Google Chrome on macOS” when its net crawler itself was blocked. Cloudflare CEO Matthew Prince posted the analysis on X, writing, “Some supposedly ‘respected’ AI corporations act extra like North Korean hackers. Time to call, disgrace, and laborious block them.”
However many individuals disagreed with Prince’s evaluation that this was precise unhealthy habits. These defending Perplexity on websites like X and Hacker Information identified that what Cloudflare appeared to doc was the AI accessing a particular public web site when its consumer requested about that particular web site.
“If I as a human request a web site, then I ought to be proven the content material,” one individual on Hacker Information wrote, including, “why would the LLM accessing the web site on my behalf be in a special authorized class as my Firefox net browser?”
A Perplexity spokesperson beforehand denied to TechCrunch that the bots had been the corporate’s and known as Cloudflare’s weblog publish a gross sales pitch for Cloudflare. Then on Tuesday, Perplexity printed a weblog in its protection (and customarily attacking Cloudflare), claiming the habits was from a third-party service it makes use of often.
Techcrunch occasion
San Francisco
|
October 27-29, 2025
However the crux of Perplexity’s publish made an analogous enchantment as its on-line defenders did.
“The distinction between automated crawling and user-driven fetching isn’t simply technical — it’s about who will get to entry info on the open net,” the publish mentioned. “This controversy reveals that Cloudflare’s methods are basically insufficient for distinguishing between respectable AI assistants and precise threats.”
Peplexity’s accusations aren’t precisely honest, both. One argument that Prince and Cloudflare used for calling out Perplexity’s strategies was that OpenAI doesn’t behave in the identical method.
“OpenAI is an instance of a number one AI firm that follows these finest practices. They respect robots.txt and don’t attempt to evade both a robots.txt directive or a community stage block. And ChatGPT Agent is signing http requests utilizing the newly proposed open customary Internet Bot Auth,” Prince wrote in his publish.
Internet Bot Auth is a Cloudflare-supported customary being developed by the Web Engineering Job Power that hopes to create a cryptographic technique for figuring out AI agent net requests.
The controversy comes as bot exercise reshapes the web. As TechCrunch has beforehand reported, bots looking for to scrape large quantities of content material to coach AI fashions have turn out to be a menace, particularly to smaller websites.
For the primary time within the web’s historical past, bot exercise is presently outstripping human exercise on-line, with AI site visitors accounting for over 50%, in keeping with Imperva’s Unhealthy Bot report launched final month. Most of that exercise is coming from LLMs. However the report additionally discovered that malicious bots now make up 37% of all web site visitors. That’s exercise that features every little thing from persistent scraping to unauthorized login makes an attempt.
Till LLMs, the web typically accepted that web sites may and will block most bot exercise given how usually it was malicious through the use of CAPTCHAs and different providers (resembling Cloudflare). Web sites additionally had a transparent incentive to work with particular good actors, resembling Googlebot, guiding it on what to not index by robots.txt. Google listed the web, which despatched site visitors to websites.
Now, LLMs are consuming an growing quantity of that site visitors. Gartner predicts that search engine quantity will drop by 25% by 2026. Proper now people are likely to click on web site hyperlinks from LLMs on the level they’re most beneficial to the web site, which is when they’re able to conduct a transaction.
But when people undertake brokers because the tech trade predicts they are going to — to rearrange our journey, ebook our dinner reservations, and store for us — would web sites damage their enterprise pursuits by blocking them? The controversy on X captured the dilemma completely:
“I WANT perplexity to go to any public content material on my behalf after I give it a request/activity!” wrote one individual in response to Cloudflare calling Perplexity out.
“What if the location house owners don’t need it? they simply need you [to] instantly go to the house, see their stuff” argued one other, declaring that the location proprietor who created the content material needs the site visitors and potential advert income, to not let Perplexity take it.
“Because of this I can’t see ‘agentic shopping’ actually working — a lot more durable drawback than individuals assume. Most web site house owners will simply block,” a 3rd predicted.