"Lonely young people are likely better off texting a random stranger than talking to a chatbot."
Rosie Thomas for @404mediaco:
https://www.404media.co/chatgpt-loneliness-study-college-students-random-strangers-texting/
| Private | https://lambda.foldr.org/~vs/ |
| Work | https://selabhvl.github.io/vsto/ |
| Github | https://github.com/VolkerStolz |
| DBLP | https://dblp.org/pid/24/2502.html |
"Lonely young people are likely better off texting a random stranger than talking to a chatbot."
Rosie Thomas for @404mediaco:
https://www.404media.co/chatgpt-loneliness-study-college-students-random-strangers-texting/
Oh nein, das alles hätte man ja vor 2.5 Jahren noch garnicht wissen können, als der Beteiligungsprozess begann.
[…]
Ich suche jetzt nicht meine Tweets von damals raus. Weder die Verstrickung zwischen mehreren Mitarbeitern der SPRIND und anderen Wallet-Organisationen, noch das der Funkeprozess eine völlig intransparente Beschäftigungstherapie ist, waren irgendwann unklar.
Aber gut das erstmal mit ner Teilnahme zu legitimieren.
The 37th European Summer School in Logic, Language and Information
(ESSLLI 2026) will take place on 3-14 August in Prague.
https://2026.esslli.eu
I'm excited that I'll be teaching an introductory course on univalent foundations / homotopy type theory!
@stringdiagram and @jaklt will also be running an interesting workshop titled "Semantics and compositionality for expressiveness and complexity".
Early registration closes on 31st May.
Web scraping tarpits are catching legitimate data teams, not just AI crawlers
This is an oldie (submitted 7 days ago), but a goldie. I initially just rolled my eyes at the title, but last night I ended up landing on the article itself, and let me tell you, that was a blast!
Let me just quote the full second paragraph (emphasis mine):
Nepenthes isn't alone. Projects like Locaine and a growing list of open-source "tarpits" have popped up on GitHub, each with the same pitch: if AI companies won't respect robots.txt, site owners will fight back with poison.
As far as I can tell, there is no tarpit named "Locaine". Certainly not on GitHub. But neither Nepenthes nor Iocaine (with an capital "i", not an "l") are on GitHub either. At least I assume the author meant iocaine, because the link to Nepenthes leads to an Ars Technica article from 2025 January. Said article mentions iocaine, correctly spelled.
I can't speak for other crawler defenses, but iocaine's pitch has always been "if you are a hostile crawler, I'll fuck you up". Poison is just one delivery mechanism, ignoring robots.txt is just one of the reasons.
The problem is that tarpits can't tell the difference between OpenAI's crawler and your price monitoring script.
For a good reason! We don't fucking want to tell the difference, motherfucker!
Now, there are legit reasons for price monitoring. When you suspect big chains of fraud, of price fixing, and other shady stuff: go ahead, scrape their webshops.
Otherwise? You have no business scraping us, and should fuck right off with your bullshit.
Same applies for any other "data collection" "script". Just because it's on the internet, and you can technically scrape it, doesn't mean you should. Like, even though I present as a male, with beard and all, and I sometimes appear in public - that does not entitle anyone to try and measure the size of my penis "for research purposes"1.
Tarpits detect automated request patterns.
Eh, no. A lot of the tarpits don't detect anything. That's why they're called tarpits: anything that enters, gets tarred.
If your scraper follows links systematically, hits pages at consistent intervals, or skips JavaScript execution (the way most AI training crawlers operate), it looks like a target.
No, not necessarily.
Research from Rutgers and Wharton found that sites blocking AI crawlers saw a 23.1% decline in total traffic and a 13.9% drop in human traffic.
I'd question the validity of that research. It certainly does not reflect what I see. If blocking AI crawlers saw only a 23.1% decline in total traffic, they weren't blocking hard enough. I'm seeing a ~90% drop myself, and human traffic increased significantly since I started blocking crawlers.
All that while very explicitly opting out of major search engines at the same time, on top of the crawler defenses.
And tarpits go further: they actively waste a crawler's compute, storage, and bandwidth while feeding it data that degrades whatever model or database it's building.
Thing is, most of the garbage served by the tarpits do not end up in training sets. They're fairly easy to recognize and filter out before training. We do not necessarily serve garbage to poison the model - we serve poison because the fuckers don't respect a 404, and serving poison lets us fill their crawler queue with poisoned URLs. Any effect on the models and training is a happy accident at best.
What Data Teams Should Do Now
Get consent first.
Scraping entire websites for whatever purpose (perhaps other than archiving - but in many cases, that should be opt-in too, in this humble author's opinion) is never okay. Don't do that shit.
We think the web is heading toward a clear split. On one side: sites that monetize data through paid access agreements, API partnerships, and licensed crawling.
Good. You want my data? Pay for it.
If you're curious, and followed this footnote for some lewd algernon facts: sorry to disappoint you! You have no business knowing the size of my penis, either. ↩︎
PDF-„Verschlüsselung" mit Geburtsdatum: Sicherheitstheater in der Medizin.
In der Medizin werden PDFs mit sensiblen Befunden regelmäßig per E-Mail verschickt. Verschlüsselt mit dem Geburtsdatum der Patient*innen. Das wird als sicher wahrgenommen. Das gilt sogar als Feature.
Ich habe es gerade getestet. 🧵
RE: https://chaos.social/@radiologe/116419103802035620
Sonntag morgen, Datenleak-Sorgen.
Übermittlung sensibler Informationen in der Medizin, wie geht das?
Naja, fangen wir mal mit der Erklärbärin-Stunde an… 🧸
One fewer instruments running on Voyager 1.
Because plutonium will keep decaying.