The failure of the Internet to deliver its promise is particularly noticeable when you hunt for repair manuals for a product from the 90s. Used to be, the information would either be there or not there, finable or unfindable.

Now, there are hundreds of algorithmically generated sites claiming to have it just because it appeared in their search logs, generating potemkin village content traps with endless paging, broken-thumbnail named-like-the-file-you-want but actually-just-ebay-photos bullshit

Even if you find the manufacturer's site it’s more likely than not broken, with a search feature that pulls up what it claims are results but empty divs instead of links, or busted jQuery code from 2013 that prevents anything from loading.

Is it a real but broken site? Is it just another click farm? Does it matter?

@eaton archive.org’s way back machine has saved my bacon for some real obscure stuff, - palm m500 thumb board drivers - but often just as fruitless 😭

The click farms are infuriating though.

@gadgetoid @eaton

I've had a Cisco NSLU2 that's been running nearly continuously since maybe 2008 (with a brief downtime for recapping a few years ago). The site from that era that explains how to bitbang a truly minimal linux into the flash still exists, but because it doesn't run adsense, Google doesn't care about it and it's impossible to find. And since the traffic to it is so low, neither do other engines (I've since archived a copy).

It truly feels like the 90's before Google or Alta Vista even existed.

Pretty soon people will start bookmarking anything obscure they find, then exporting their bookmarks to HTML and uploading it somewhere for ease of access, calling it their, "Home Page" or some such.
Sarcasm...I think.

@gadgetoid @eaton

The most frustrating for me is when archive.org has the front-facing page but none of the files.

@gadgetoid @eaton the true heroes are the random open directories with random lawn mower, laptop, generators, trucks, etc workshop book and manual scans going back to the 80s
@gadgetoid @eaton found one once that had engineering drawings for Disneyland rides... Was neat.
@DarkestKale @eaton this is extremely true, getting harder to find and “index of /“ searches are broken these days too 😭
@eaton Yeppppp. This is why I immediately download and save the manual for any new thing I buy, during the narrow window where it's likely to work.
@ieure @eaton but that would take *megabytes*! Even tens of megabytes! Madness!
@ieure @eaton
That's fine for new things. Time was you could dig up manuals for things made a half-century or more before, just because someone had a copy and decided to upload it. I have manuals for machine tools made in the 40s and I can assure you I didn't buy those tools new. I seriously doubt I could find those manuals now.
@TheGreatLlama @eaton Agreed, my experience finding info for older stuff is what drove me to start saving it for newer things, too. They'll be old, too, someday.
@ieure @eaton yeah, I have a colossal folder on my NAS just stuffed with documentation PDFs and also hardware drivers. I try to archive all drivers of any hardware I acquire (within reason). I try to pretend like these things will only ever be available for a week and never again. Kinda realistic actually, just a bit time-compressed haha
@amatecha @eaton Same. Drivers, firmware updates, manuals, everything I might need.

@ieure @eaton

Intel taught everyone a lesson in that regard.

@ieure @eaton Unfortunately, the manuals are sometimes flat-out wrong too.

@eaton I find this bitterly ironic, in that a chunk of my working life (1991-2000) was spent in a business that was solely built on distributing service manuals, and one of the biggest fights I had with my dad was predicting that digital distribution would ultimately kill our business. We were turning over ~A$1M p/a when I quit in 2000.

I was, sadly, mostly correct (it's still running, barely, distributing rare manuals digitally).

@eaton or there's a weird early naughts forum like site that requires you to make an account to download anything. Looking at you, hifiengine.
@11backslashes @eaton honestly even these feel like godsends these days. Much better to have it in some weird (but still searchable) forum than in a fucking discord server or a social media post.

@eaton the busted jQuery from 2013 is what gives me hope that it has an answer somewhere if I can only find an old jQuery bundle to inject into the page. It's been around long enough to be before the entire site was made with JS and had the lifespan of a fruit fly.

I still remember when all I had to do was ask Google for filetype:pdf of something and I'd find records...now it's all content farms with PDFs of garbage.

@eaton
The information was always valuable. Yesterday you did not have the tools to find it on the web, today you don't have the tools to find it in this AI noise, tomorrow there will be something new.
@eaton It strikes me that the clickfarms have no benefit to anybody other than those who own copyright on a service manual. For want of a better term, packet transmissions and processor cycles cost money. If those AI clickfarms are just randoms, what's their payday. If they're copyright owners, we know what their payday is - people give up trying to find the manual and pay for it. On a network with billions of users, the data costs of running a few clickfarms has a payday in real sales. Why do we, as a species, always seem to trust the guys with money to always be the ones doing the right thing? They rarely are, look at what Elon's doing to twitter.
@crunchysteve @eaton their payday is ads and/or malware.

@djmitche @crunchysteve @eaton

ads, malware and affiliate links, or a combination of all 3 in one click.

@crunchysteve @eaton

The payday of AI clickfarms is advertising, because we've built our model of "free internet" on selling Internet ads.

@Leszek_Karlik @crunchysteve I’d even abstract it and say that the payoff is “attention + interaction,” which they can and will monetize in any way they can.

Distributing malware, using the traffic to generate ad clicks, using pass through mechanisms to trick users into solving other sites’ captchas then selling “automated captcha bypassing” as a service, etc.

The common thread is that none of those require any real service to the visitor; in fact that’s an unecessary expense.

@Leszek_Karlik @crunchysteve that particular dynamic is what I find troubling: it’s *partially* a matter of ad funded internat, but the rise of automated content generation is big influence too, and was part of it even before LLMs entered the scene.
@crunchysteve @eaton As always, the real culprit is crapitalism.
@maxthefox @crunchysteve @eaton Always people around for whom making bucks is the only goal. And if they can do it without lifting a finger, they'll do so.

@eaton
I feel your pain.

A terrific crowd-sourced, federated solution to the problem would be for every Mastodon user to commit to scanning in some old manual they have lying around to PDF and posting it with a specific hashtag like #manual. Just one apiece and we'd have lots of them.

Oh, wait, that's right. You can't post a PDF on Mastodon. Sorry, never mind.

@eaton
Cut to 10 years from now when it's broken React code with variable names in plain view because the API changed.

🌈 🦄✨progress🌟

@eaton it does this with scholarly papers too
@tim @eaton I'm having trouble seeing the parallel with scholarly papers. There are canonical sources (ScienceDirect, Crossref, publisher websites) and long-term preservation systems (LOCKSS, etc) that mostly do the job of preventing this. Are you talking about Academia.edu and Researchgate?
@williamgunn @eaton if you search for a given paper, you’re just as likely to find a low quality database that’s indexed the title or maybe the abstract as you are an actual source for the paper
@williamgunn @eaton so it’s not that they’re unavailable, although sometimes they are; but more that (like with Jeff’s example of user guides) you have to sort through a lot of churny false positives to even see if it can be found.

@tim @williamgunn @eaton Although it sounds like you can just skip the search engine and go straight to the publications DB.

There are other ways to navigate the web that don't involve making a Google search.

@StryderNotavi @williamgunn @eaton see I thought we were commiserating about the way false indexing has made it harder to find reliable databases and especially made broad-based searches worse, while you seem to think I’m complaining that I don’t know how to find a journal article even if I know its exact title and where it was published
@tim @StryderNotavi @eaton I'm not sure where the confusion is arising, but the databases of scholarly work have been the same ones for a long time. Finding a reliable database or index of articles isn't something people do on a regular basis. You do it once.
@tim @eaton You could be right. I guess I don't Google papers that often because I know where to look - Pubmed, Arxiv, etc.
@williamgunn @tim @eaton librarians are paid to know which database to use
@eaton This is also why it's such a problem that independent websites are now falling so far down search engine rankings. When I was younger, if you had a problem with A Thing from a larger category of Things, you'd go to a website run by some Thing Enthusiast or group of them, and look your Thing up in the site's index or inbuilt search box. If it wasn't there, you'd post on the site's forum and another Thing Owner might help. But now... good luck even finding out that indie site exists.
@bioluminescently yep. It’s one of the reasons that enthusiast communities with curated lists of links have turned into a more promising resource for many subjects. Thankfully, Reddit is still going stro— aaahhh fuck
@eaton I really didn't see the Reddit thing coming - I've got an account but I've rarely been an active user; it's where I go because people have linked to an AMA or a discussion, and it feels (I mean this kindly) like a living fossil: as crocodiles are to the dinosaur age, so Reddit is to the age of forums: it's the thing that survived, so it seemed like it would be around in that form forever and everything else would change around it.

@bioluminescently @eaton that’s a perfect descriptions of what I loved about the Internet.

You could find the enthusiasts and get their expertise by simply reading what they wrote for the world to see.

EDIT: though that was also the time when running a Forum wasn’t a full-time job just to keep the spambots out …

@ArneBab @bioluminescently @eaton

Totally agree, it applies to basically everything with a product code or make/model number today.

The answer was a community of humans, it used to be called Reddit until someone burned it down. I have heard great things about Lemmy, though I have yet to try it myself. I guess that the best response to AI spam is the return to the original search engine, a 1st generation Yahoo like index with categories, and a way to keep the SEO out.

Https://join-lemmy.org

Lemmy - A decentralised discussion platform for communities

Lemmy

@james I think the first index — dmoz https://en.wikipedia.org/wiki/DMOZ — with categories would be hard to scale.

The only thing getting close to it nowadays is Wikipedia.

And then I read up on it and realize that it’s actually still active: https://curlie.org/

That’s the followup project of DMOZ. License is cc attribution.

@bioluminescently @eaton

DMOZ - Wikipedia

@ArneBab @james @bioluminescently @eaton with the enshittification of the web with AI, I wonder if human curated directories like DMOZ might be valued again?
@matthewskelton @ArneBab @james @eaton I do see potential there: when AI is the standard, human curation might become more prized because it feels more accurate and is able to respond to personal and idiosyncratic sensitivities. I can think of many medical info curation contexts online where no AI will be half as useful as an experienced patient, for instance.
@ArneBab @james @eaton Thank you! I'm so glad that this conversation is throwing up examples of what we want!
@ArneBab @james @bioluminescently @eaton I use Wikipedia so often because I think this is a big issue in general: Assume I want to compare, for example, software. I don't know anything about it initially. Wikipedia is often one of the few useful sides because all other ones are created for marketing purposes, not information. It is surprisingly often easier to find basic facts about a product on #Wikipedia than on the product's webpage.
@Geo @ArneBab @james @eaton And it also tends to have a Controversies section, where that is relevant, which can be very good for those times when there's ethical question marks over the person or the product, and you'd rather be warned so you can take your money elsewhere. But it's not even just about that; I'd also want to know if an app I'm interested in had a famous avoidable snafu where it leaked sensitive user data.
@james @ArneBab @eaton Thanks, I hadn't come across that! And yes, it's so strange how dreamlike the Yahoo categories seem now: of course with humans there's always biases, but I think that before AI swept in and took over we were getting a lot better at acknowledging that and boosting curatorial work from marginalised perspectives. The book blogosphere especially had become genuinely useful in that way.
@bioluminescently @eaton Did you give https://search.marginalia.nu/ a try for this kind of use case?
Marginalia Search

search.marginalia.nu is a small independent do-it-yourself search engine for surprising but content-rich websites that never ask you to accept cookies or subscribe to newsletters. The goal is to bring you the sort of grass fed, free range HTML your grandma used to write.

search.marginalia.nu
@bioluminescently @eaton So much this. I hate feeling like the "old man yells at cloud" meme is now about me, but like: people really ought to be yelling, and that is not a natural cloud, it is the smoke from a bunch of corporate scumbags setting everything good in the world on fire
@keengrasp @eaton That is a really apt way of putting it: the very fact we do have all this tech means the pace of change is faster and more far-reaching than what people had to deal with even a generation ago, and we have to reconfigure our assumptions about caveating and resisting advances (or "advances") in that light.