Man, the "dead internet" is arriving

@bhawthorne describes his recent experience searching for basic info online -- he looked for the temperature to roast hazelnuts, and got nothing but stochastic-parrot garbage: https://infosec.exchange/@bhawthorne/111601578642616056

He concludes:

"I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past."

Brian Hawthorne (@[email protected])

How bad are the thousands of new stochastically-generated websites? Last night I wanted to roast some hazelnuts, and I could not remember the temperature I used last time. So I searched on DuckDuckGo. Every website that I could find was machine-generated with different temps listed. One site had three separate methods listed that were essentially differently worded versions of the same thing. With different temperatures. So I pulled my copy of Rodale’s Basic Natural Foods Cookbook off the shelf and looked it up there. I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past.

Infosec Exchange

Update: A bunch of folks have replicated this experiment and found that the search results were pretty accurate -- they didn't encounter the same flood of shoddy, content-farm stuff, AI-generated or otherwise

So that's good!

@clive I often look for error messages, and if I'm lucky, the first result is from an actual human being, asking about the same error message. The rest is all generated garbage that contains the phrase.
@clive
That is bad.

@atthenius

Other folks in the replies are finding better results with Duck Duck Go, so pointing out that things are not so bad yet

@clive @atthenius it can still only search the websites that exist.
@clive @rysiek @bhawthorne I have my doubts.

One can probably arrive at a reasonable index simply by taking the current one and removing all the sites using ads.

Sure that's going to have false positives removed and a few false negatives remaining, but it would still be a good start.

@lispi314 this is a pretty damn smart way of going about it!

The generated crap sites are generated to sell ads, to mine Google Ads etc for click-money. If a site has no ads, it is likely to not have been generated.

Love it.

@bhawthorne @clive

@lispi314 @bhawthorne @rysiek @clive It's a very clever idea and DDG should implement this as an option. I can imagine countermeasures evolving but it's a start.
@neilk @clive @rysiek @bhawthorne At some point it becomes a question of heuristics to filter out those malicious others.

Even just site age online might serve as a decent one, since keeping a profit-less SEO site online for a year gets expensive, especially since the second monetization is attempted & it's noticed it'd get delisted.

The main ones that would remain are intention psy-ops and disinfo-ops, where profitability over years is *not* a concern at all.
@lispi314 @bhawthorne @rysiek @clive Yup that's exactly what I was thinking. It may sound crazy today to worry about psy-ops on the search for hazelnut recipes. But a fringe minority of food influencers do use their presence as a gateway into fashy-adjacent "trad" culture. They're often rich from selling things other than ads, so they would not be harmed by an ad-free search experience.
@lispi314
@bhawthorne @rysiek @clive
This is an interesting idea, I've thought for a while that Google could be doing a lot more to delist spammy sites that clog up search results, but turns out it may not be such a technical problem as a business model one, Google benefits from sending people to spammy sites brimming with ads served by Google.

@tetron @lispi314 I really want someone to seriously explore the "no-ads web index". It would probably have to be a paid service to survive.

But it also might at some point become one of the few places where actual *information* could be found.

@bhawthorne @clive

@clive I think this is grossly overstated.

The first result for "how to roast hazelnuts" on DDG is Wikihow. The second is culinaryhill.com, whose hazelnut roasting page has been up since (at least) 2019 according to Wayback.

The 8th and 9th results are Martha Stewart and Epicurious, two well-known brands whose pages I would trust.

Three of these pages say 350deg; Martha Stewart says 375.

I'm frustrated by #AI consuming the web, too, but the AI doomsaying on Mastodon is off the charts.

@jsit @clive yeah, this is very light evidence for the claim. I look up simple temperatures and kitchen info all the time, and it's really easy to find legit sites.

@uncle_vinny "Every website that I could find was machine-generated with different temps listed.”

There is 0% chance this is true.

@jsit @uncle_vinny Ironically, the one sus site I found via the Googly was "Ai Made It for You".

However, the person behind the site is named Ai Willis, and she's been running it since 2016.

https://web.archive.org/web/20160325070249/http://www.aimadeitforyou.com/how-to-roast-hazelnuts/

How to Roast Hazelnuts – Ai made it for you

@jsit @clive also, recipe websites have had paragraphs of shallow nonsense on them for years, because of - as far as I understand - search engine as well as Engagement(tm) optimization, and so that the sites can run more ads. That „content“ sounds like it dropped out of an LLM, but the practice predates chatgpt and friends by a fair amount of time ( https://katherineluck.medium.com/why-recipe-blog-posts-are-so-long-2f1725cfbbf )

@halcy @jsit @clive I know it too well, searching for recipes and you get to a website 99% of it a story about the food in the recipe and the tiny 1% is the recipe

Obviously I am being pedantic and over exaggerating to make a point.

@jsit @clive

I switched to the DDG browser 6 or so months ago. Very pleased with it in general.

Browser aside, DDG search results are pretty solid relative to Google.

@Sir_Osis_of_Liver @jsit

I've been DDG-first for over a year now -- I generally like it too

@clive @[email protected] kiwix took a while to do the 90GB download of Wikipedia but its SD card space well allocated, I feel

@Susan_calvin @bhawthorne

I might download it for fun too

@clive @[email protected] it was just empty space in the SD card, now I'll never need a connection to hit random article again...
@clive @bhawthorne We should call it the Dark Age of Machines...

@clive @bhawthorne I found this video yesterday: https://www.youtube.com/watch?v=AYPdwNLV0p4

Truly the peak of technological development.

There are so many AI-generated videos on YouTube like this. It's awful. The least efficient way possible of answering a question, just to make some tiny amount of money in ad revenue.

[SOLVED] HOW TO CHANGE VK USERNAME?

YouTube
@clive @bhawthorne This is pollution of a public resource. We need environmental regulation of cyberspace
@clive @bhawthorne consciously exhuming the term cyberspace like I'm writing a manifesto for Mondo 2000 #bravenewworld
@clive @bhawthorne weird. I tried both Google and duck duckgo with the same query and got quite reasonable sites spanning the last couple of years, with quite a lot of text variety in how to roast hazelnuts but all of them made a great deal of sense. If you roast them at 350 you go longer, if you roast them at 375 you go shorter etc. That's always been the case with recipes. I think there are other queries that will prove his point better. I've seen what he was talking about, but not with that.
@clive @bhawthorne It feels like we're at the point in history where we need to band together to re-create civilization after a global apocalypse. Start with pockets of people who trust each other: establish trust with your local community and then build outwards.
@clive @bhawthorne We still have libraries with actual books, although even there storage limits restrict how long items are kept.

@clive People actively seek steel that was smelted and forged before 1944 ... old warships and the like ... to harvest the low radiation steel. After that, all steel has become contaminated.

We should download and archive as much ID the internet as possible to have reference material from before it was contaminated by "AI".

CC: @bhawthorne

@clive @bhawthorne Somebody please tell me what other purpose these genius-brats could possibly have with their AI sh*t besides brainwashing our children to think there are no facts, and no truth, causing permanent civil wars between factions with fundamentally different viewpoints and beliefs? Then investing in armaments and the prison industry? I'm desperately clinging to Hanlon's razor, and glad I'm sure to be dead in less than 30 years.

@clive when I find my search results contain more words than information it is time to change search engines. Although sometimes I find that Google still returns more specific results to a specific question than DuckDuckGo. There is still some good code buried in there when not just trying to sell me something.

Examples from Google:

"rivets" -poor- all results besides Wikipedia are where to buy

"why do we see yellow" -good- most answers recognize that we don't have yellow cones

@clive @bhawthorne

Do people have forgotten how to use search engines properly? A simple classic three-term search query "roast nuts temperature" delivered useful results for me both in DuckDuckGo and in Google.

@clive
Something that I have been wondering about is whether we can normalise pgp signing content that we produce for the wider internet. Coupled with good old fashioned key signing parties we could at least establish a semblance of trust networks against the rise of the dread parrots.
@bhawthorne
@clive gonna have to establish laws regulating "digital space junk" or something