Man, the "dead internet" is arriving

@bhawthorne describes his recent experience searching for basic info online -- he looked for the temperature to roast hazelnuts, and got nothing but stochastic-parrot garbage: https://infosec.exchange/@bhawthorne/111601578642616056

He concludes:

"I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past."

Brian Hawthorne (@[email protected])

How bad are the thousands of new stochastically-generated websites? Last night I wanted to roast some hazelnuts, and I could not remember the temperature I used last time. So I searched on DuckDuckGo. Every website that I could find was machine-generated with different temps listed. One site had three separate methods listed that were essentially differently worded versions of the same thing. With different temperatures. So I pulled my copy of Rodale’s Basic Natural Foods Cookbook off the shelf and looked it up there. I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past.

Infosec Exchange
@clive @rysiek @bhawthorne I have my doubts.

One can probably arrive at a reasonable index simply by taking the current one and removing all the sites using ads.

Sure that's going to have false positives removed and a few false negatives remaining, but it would still be a good start.

@lispi314 this is a pretty damn smart way of going about it!

The generated crap sites are generated to sell ads, to mine Google Ads etc for click-money. If a site has no ads, it is likely to not have been generated.

Love it.

@bhawthorne @clive

@lispi314 @bhawthorne @rysiek @clive It's a very clever idea and DDG should implement this as an option. I can imagine countermeasures evolving but it's a start.
@neilk @clive @rysiek @bhawthorne At some point it becomes a question of heuristics to filter out those malicious others.

Even just site age online might serve as a decent one, since keeping a profit-less SEO site online for a year gets expensive, especially since the second monetization is attempted & it's noticed it'd get delisted.

The main ones that would remain are intention psy-ops and disinfo-ops, where profitability over years is *not* a concern at all.
@lispi314 @bhawthorne @rysiek @clive Yup that's exactly what I was thinking. It may sound crazy today to worry about psy-ops on the search for hazelnut recipes. But a fringe minority of food influencers do use their presence as a gateway into fashy-adjacent "trad" culture. They're often rich from selling things other than ads, so they would not be harmed by an ad-free search experience.
@lispi314
@bhawthorne @rysiek @clive
This is an interesting idea, I've thought for a while that Google could be doing a lot more to delist spammy sites that clog up search results, but turns out it may not be such a technical problem as a business model one, Google benefits from sending people to spammy sites brimming with ads served by Google.

@tetron @lispi314 I really want someone to seriously explore the "no-ads web index". It would probably have to be a paid service to survive.

But it also might at some point become one of the few places where actual *information* could be found.

@bhawthorne @clive