https://hacktivis.me/articles/cloudflare-turnstile-webgl-fingerprinting #Cloudflare #useragent #HackerNews #ngated
So, with Google announcing "Search is going full-AI, we won't be sending traffic to the original sites any more", someone else pointed out that this eradication of the traditional search-engine compact - we let you crawl our sites to create your index, and you send visitors to our sites when relevant - means that we can, and should, block all of Google's crawlers now. If they're going to just take, take, take and give nothing back, why let them access your content at all?
But this is cute. Besides the fact that Google documents that some of their crawlers ignore robots.txt, there's this bit of fun. On this page (https://developers.google.com/crawling/docs/robots-txt/create-robots-txt), they link to "the Google list of user agents" (https://developers.google.com/crawling/docs/crawlers-fetchers/overview-google-crawlers).
However, that links to 3 separate pages of them, and *each of those pages explicitly states that is not comprehensive, but only the ones they commonly get questions about*. And of course, none of the "User-triggered fetchers" obey robots.txt, along with some others.
So Google isn't even reporting the full list of user-agents that can be used to stop their crawling.
That is some bullshit.
#Google #crawler #RobotsTxt #UserAgent #bullshit #antisocial #web #search #WebSearch #LLM #AI
Ahh so, na das erklärt alles. Ich war diese Woche auch betroffen.
Ist das ein Fehler in der Software oder will die Bahn die Leute in die Datenschutz-unfreundliche Bahn-App zwingen?
#deutschebahn #linux #webentwicklung #useragent #digitaleausgrenzung
Die Bahn und Pünktlichkeit? Kompliziert. Die Bahn und Linux? Noch komplizierter. 🙃
Zum Artikel: https://heise.de/-11300742?wt_mc=sm.red.ho.mastodon.mastodon.md_beitraege.md_beitraege&utm_source=mastodon
#deutschebahn #linux #webentwicklung #useragent #digitaleausgrenzung
When you have to fake your User-Agent because Zeit.de blocks Firefox with "CrawlerDetected" via some dumb script they found on Github.
Of course this is all just their desperate attempt at reducing traffic from slop-bots that leads to real users being blocked too.
Hope someone reads the logs sometimes, but I doubt it.
Even in the era where all major desktop environments are going Wayland-only, web browsers will ensure we never get rid of X11's traumatic memory, huh? 🫠 https://bugzilla.mozilla.org/show_bug.cgi?id=2027556
Par for the course for the ecosystem where everybody pretends to be everybody because the web is a never-ending collection of hacks…