Man, the "dead internet" is arriving

@bhawthorne describes his recent experience searching for basic info online -- he looked for the temperature to roast hazelnuts, and got nothing but stochastic-parrot garbage: https://infosec.exchange/@bhawthorne/111601578642616056

He concludes:

"I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past."

Brian Hawthorne (@[email protected])

How bad are the thousands of new stochastically-generated websites? Last night I wanted to roast some hazelnuts, and I could not remember the temperature I used last time. So I searched on DuckDuckGo. Every website that I could find was machine-generated with different temps listed. One site had three separate methods listed that were essentially differently worded versions of the same thing. With different temperatures. So I pulled my copy of Rodale’s Basic Natural Foods Cookbook off the shelf and looked it up there. I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past.

Infosec Exchange
@clive @rysiek @bhawthorne I have my doubts.

One can probably arrive at a reasonable index simply by taking the current one and removing all the sites using ads.

Sure that's going to have false positives removed and a few false negatives remaining, but it would still be a good start.

@lispi314 this is a pretty damn smart way of going about it!

The generated crap sites are generated to sell ads, to mine Google Ads etc for click-money. If a site has no ads, it is likely to not have been generated.

Love it.

@bhawthorne @clive