Volker Stolz

@fm_volker
318 Followers
426 Following
8.7K Posts
Unprofessional takes on CompSci and other things. Card-carrying member of @informatik. Sub-tooter at https://fediscience.org/@selabhvl. Part of the control-group.
https://λ.foldr.org/~vs/. Relapses into #FreeBSD.
#bergen - nordisk by nature. #FormalMethods #RuntimeVerification #Refactoring #HomeAssistant
Privatehttps://lambda.foldr.org/~vs/
Workhttps://selabhvl.github.io/vsto/
Githubhttps://github.com/VolkerStolz
DBLPhttps://dblp.org/pid/24/2502.html

"Lonely young people are likely better off texting a random stranger than talking to a chatbot."

Rosie Thomas for @404mediaco:

https://www.404media.co/chatgpt-loneliness-study-college-students-random-strangers-texting/

#AI #psychology

Texting a Random Stranger Better for Loneliness Than Talking to a Chatbot, Study Shows

A newly published study of how college students interact with chatbots and human strangers showed talking to a random person offers more connection than an LLM.

404 Media

Oh nein, das alles hätte man ja vor 2.5 Jahren noch garnicht wissen können, als der Beteiligungsprozess begann.
[…]

Ich suche jetzt nicht meine Tweets von damals raus. Weder die Verstrickung zwischen mehreren Mitarbeitern der SPRIND und anderen Wallet-Organisationen, noch das der Funkeprozess eine völlig intransparente Beschäftigungstherapie ist, waren irgendwann unklar.

Aber gut das erstmal mit ner Teilnahme zu legitimieren.

https://23.social/@linuzifer/116437508985885138

Congratulations to the winner of the 8th International Competition on Software Testing (Test-Comp 2026) Kaled Alshmrany (FuSeBMC) in the categories C.Overall, C.Cover-Branches and C.Cover-Error!

The 37th European Summer School in Logic, Language and Information
(ESSLLI 2026) will take place on 3-14 August in Prague.
https://2026.esslli.eu

I'm excited that I'll be teaching an introductory course on univalent foundations / homotopy type theory!

@stringdiagram and @jaklt will also be running an interesting workshop titled "Semantics and compositionality for expressiveness and complexity".

Early registration closes on 31st May.

Welcome to ESSLLI 2026

Welcome to ESSLLI 2026

ESSLLI 2026

Web scraping tarpits are catching legitimate data teams, not just AI crawlers

This is an oldie (submitted 7 days ago), but a goldie. I initially just rolled my eyes at the title, but last night I ended up landing on the article itself, and let me tell you, that was a blast!

Let me just quote the full second paragraph (emphasis mine):

Nepenthes isn't alone. Projects like Locaine and a growing list of open-source "tarpits" have popped up on GitHub, each with the same pitch: if AI companies won't respect robots.txt, site owners will fight back with poison.

As far as I can tell, there is no tarpit named "Locaine". Certainly not on GitHub. But neither Nepenthes nor Iocaine (with an capital "i", not an "l") are on GitHub either. At least I assume the author meant iocaine, because the link to Nepenthes leads to an Ars Technica article from 2025 January. Said article mentions iocaine, correctly spelled.

I can't speak for other crawler defenses, but iocaine's pitch has always been "if you are a hostile crawler, I'll fuck you up". Poison is just one delivery mechanism, ignoring robots.txt is just one of the reasons.

The problem is that tarpits can't tell the difference between OpenAI's crawler and your price monitoring script.

For a good reason! We don't fucking want to tell the difference, motherfucker!

Now, there are legit reasons for price monitoring. When you suspect big chains of fraud, of price fixing, and other shady stuff: go ahead, scrape their webshops.

Otherwise? You have no business scraping us, and should fuck right off with your bullshit.

Same applies for any other "data collection" "script". Just because it's on the internet, and you can technically scrape it, doesn't mean you should. Like, even though I present as a male, with beard and all, and I sometimes appear in public - that does not entitle anyone to try and measure the size of my penis "for research purposes"1.

Tarpits detect automated request patterns.

Eh, no. A lot of the tarpits don't detect anything. That's why they're called tarpits: anything that enters, gets tarred.

If your scraper follows links systematically, hits pages at consistent intervals, or skips JavaScript execution (the way most AI training crawlers operate), it looks like a target.

No, not necessarily.

Research from Rutgers and Wharton found that sites blocking AI crawlers saw a 23.1% decline in total traffic and a 13.9% drop in human traffic.

I'd question the validity of that research. It certainly does not reflect what I see. If blocking AI crawlers saw only a 23.1% decline in total traffic, they weren't blocking hard enough. I'm seeing a ~90% drop myself, and human traffic increased significantly since I started blocking crawlers.

All that while very explicitly opting out of major search engines at the same time, on top of the crawler defenses.

And tarpits go further: they actively waste a crawler's compute, storage, and bandwidth while feeding it data that degrades whatever model or database it's building.

Thing is, most of the garbage served by the tarpits do not end up in training sets. They're fairly easy to recognize and filter out before training. We do not necessarily serve garbage to poison the model - we serve poison because the fuckers don't respect a 404, and serving poison lets us fill their crawler queue with poisoned URLs. Any effect on the models and training is a happy accident at best.

What Data Teams Should Do Now

Get consent first.

Scraping entire websites for whatever purpose (perhaps other than archiving - but in many cases, that should be opt-in too, in this humble author's opinion) is never okay. Don't do that shit.

We think the web is heading toward a clear split. On one side: sites that monetize data through paid access agreements, API partnerships, and licensed crawling.

Good. You want my data? Pay for it.

#algernonReviewsHackerNews

  • If you're curious, and followed this footnote for some lewd algernon facts: sorry to disappoint you! You have no business knowing the size of my penis, either. ↩︎

  • Paying without #Google: New consortium in #Europe wants to remove #customROM hurdles
    Using #banking and payment apps on #Android smartphones with custom ROMs is a problem: A European industry consortium now wants to change that. It is an #opensource alternative to #GooglePlayIntegrity. This proprietary interface decides on Android smartphones with #GooglePlay services whether banking, government, or wallet apps are allowed to run on a smartphone.
    https://www.heise.de/en/news/Paying-without-Google-New-consortium-wants-to-remove-custom-ROM-hurdles-11204037.html
    Paying without Google: New consortium wants to remove custom ROM hurdles

    Using banking and payment apps on Android smartphones with custom ROMs is a problem: A European industry consortium now wants to change that.

    heise online

    PDF-„Verschlüsselung" mit Geburtsdatum: Sicherheitstheater in der Medizin.

    In der Medizin werden PDFs mit sensiblen Befunden regelmäßig per E-Mail verschickt. Verschlüsselt mit dem Geburtsdatum der Patient*innen. Das wird als sicher wahrgenommen. Das gilt sogar als Feature.

    Ich habe es gerade getestet. 🧵

    RE: https://chaos.social/@radiologe/116419103802035620

    Sonntag morgen, Datenleak-Sorgen.

    Übermittlung sensibler Informationen in der Medizin, wie geht das?

    Naja, fangen wir mal mit der Erklärbärin-Stunde an… 🧸

    One fewer instruments running on Voyager 1.

    Because plutonium will keep decaying.

    https://science.nasa.gov/blogs/voyager/2026/04/17/nasa-shuts-off-instrument-on-voyager-1-to-keep-spacecraft-operating/

    NASA Shuts Off Instrument on Voyager 1 to Keep Spacecraft Operating

    On April 17, engineers at NASA’s Jet Propulsion Laboratory (JPL) in Southern California sent commands to shut down an instrument aboard Voyager 1 called the

    NASA Science
    State of the world update. White line is today. Data excludes servers. Let me know if you see significant omissions.