→ Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/

“We observed that #Perplexity [an AI-powered answer engine] uses not only their declared #user_agent, but also a generic browser intended to #impersonate Google Chrome on macOS when their declared crawler was blocked.”

“This activity was observed across tens of thousands of domains and millions of requests per day.”

#AI #evade #stealth #website #browser #crawler #blocked

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.

The Cloudflare Blog
CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

@timtilberg ooooh I now see the issue with `this` and multiple header/cookie sets.

Currently I have `#user_agent=` write it's value directly into `#headers`. However, this pattern doesn't exactly work neatly with cookies, if you override `#headers=` after setting `#cookie=`.