Periodic self-repetition: As a data librarian I can say that "AI" is not a matter of personal preference -- whether you like it or not, or whether you have found some use that you think is useful. It actively destroys organized knowledge, and therefore it actively destroys civilization.

Whenever someone looks for a human written text and can't find it because statistical near variants have been created and indexed, whenever "AI" "hallucinates" a reference, knowledge has been destroyed.

@richpuchalsky bookmarked 🫡

@julesbl

@urlyman

"Periodic self-repetition" means that I'm an old person who goes on and on about the same things a lot, so if you wait I'll write something like this again.

I think that the last time I wrote more than one post about it was this thread:

https://mastodon.social/@richpuchalsky/112010213240446748

@richpuchalsky I think about this more and more. On the one hand, we have vastly more information now than has ever existed before. On the other hand, it's increasingly corrupted, and therefore useless. History may end up being vague on just what happened over the last 30 or 40 years.
The Library of Babel - Wikipedia

@Steve @richpuchalsky we already had the problem of having moved to electronic recordkeeping without having figured out good archiving practices for electronic records; now we’ve added the problem that most of the electronic records we have are slop. Yeah, historians are going to have a hard time
@richpuchalsky Shouldn‘t this bring about a trend towards vintage, or pre 2022, books?
@kawentzmann @richpuchalsky Hopefully, but we'll probably also see something similar to the forum backdating stuff trying to take advantage of it...

@KronoGarrett @kawentzmann

It's really easy to have a statistically generated writing procedure create a fake publication date for a book, whether its in a text that gets printed out as a physical book or in a supposed catalog or index of books.

You also can't just go on with pre-2022 material forever.

@richpuchalsky @KronoGarrett Well, I don't read that much. But I see where you're getting at.
@kawentzmann @richpuchalsky @KronoGarrett *sobs in chronically ill physics grad who is interested in neuroscience and queer medicine*
@kawentzmann @richpuchalsky calibre-web is awesome, also kiwix. Recommended. I do a little amateur librarian archiving this way.
@richpuchalsky the library of Alexandria is burning

@richpuchalsky
AIs write bad books (and texts, etc.) -- incoherent mash-ups of existing human works, and eventually worse mash-ups of human and AI works. But how does the existence of a bad book destroy knowledge? The good books still exist.

The problem in my view is less bad AI books, and more bad indexes and catalogs, (i.e. search pages made by foolish search engine corporations), and the ignorant public's very much misplaced trust in those bad indexes.

@oof @richpuchalsky if the good information is drowning in a sea of slop, how is one supposed to gain knowledge?

@tedmielczarek @richpuchalsky

Bad books have always been a majority. Remedies include:

Bibliographies from reputable sources.

Critics who sift through new and old works and offer considered opinions.

A cautious respect for bad ideas, which when applied instruct by negative example.

Recommendations from trusted sources, whether friends, rivals, authors modern or ancient, and teachers.

Stop using indexes that direct readers to slop.

@oof @richpuchalsky AI books and articles become less and less incoherent. That's what they are made for, after all, to sound reasonably similar to human-written text. They don't become more truthful, though, because that's what they're not made for. Bottom line, it becomes harder and harder to distinguish AI slop with a good amount of bullshit from human-written, good research. And even the latter over time is infested with more and more bullshit because when your sources are citing sources that are citing sources that contain made-up AI BS, at some point you won't know what's true and what is not.

@DerPumu @richpuchalsky

To uncritically cite a faulty source directly or by proxy is bovine scholarship. Give it less attention.

What's needed are scholars who critically cite faulty sources to corral them.

(The opening chapters of Gilbert's 'On the Lodestone' offer an unrelenting rundown of centuries of careless pseudo scholarship on magnets, whose authors comprise a kind pre-enlightenment artificial intelligence endlessly repeating and embroidering their own output.)

@oof @richpuchalsky how do you determine that they are faulty? You check their sources. You check the sources of those. It all looks legit. Do you go for a fourth layer of sources? A fifth? When do you stop? When the original slop article is buried 5 layers or more in, it becomes impossible to fact check.
@DerPumu @oof @richpuchalsky Plus the sheer volume of content that can be generated in a short amount of time. How do people keep up?

@DerPumu @richpuchalsky

A scholar checks sources recursively when seeking the reason for some inconsistency or contradiction.  First test the conclusion; if it is false or imperfect, only then test the most likely sources of error.

Rooting out entrenched errors may require a serious amount of labor.  (William Gilbert needed decades.)  Some medical misunderstandings can take lifetimes to resolve.   But scholars are in this sense never working alone.

@richpuchalsky @oof Can you help point us to any good indexes?

@richpuchalsky So, in effect, it leads to a situation like in a totalitarian system, on which Hannah Arendt writes:

“A people that can no longer distinguish between truth and lies cannot distinguish between right and wrong. And such a people, deprived of the power to think and judge, is, without knowing and willing it, completely subjected to the rule of lies.”

@richpuchalsky we feel this SO HARD this week. it is no longer possible to find answers to trivial technical questions with a web search.

we're very grateful we have a curated personal library of authoritative reference material, but a lot of key Unix ideas are socially-defined and not formally documented anywhere.... (for example, just now we were trying to remember the semantics of when chown is allowed, and the rationale for it)

@ireneista @richpuchalsky

[off on a tangent...]

I wonder if, for some technical systems, we could assume that we got incorrect information if the answer is "too neat".

(For example, my pet peeve: The number of dashes for different Unix commands.
find / --name '*.sh'
"Wait. Two dashes? I now have docs for 10 programs who have consistent syntax. That is highly unlikely.")

@wakame @richpuchalsky thus far, we've had no trouble noticing it's wrong because it's clickbait promising to be either an article or a forum post about our exact topic, which only needs a one-line answer, but the actual content is 2,000 words of relevant background information we already knew, and no actual answer to the original question
@wakame @richpuchalsky for older Unix topics that haven't changed much, we would legit have better luck grepping 30-year-old Usenet archives... sigh
@ireneista @wakame @richpuchalsky I wonder how long it takes until we start seeing trust chains designed to validate a human wrote or vetted a thing? (at least, I hope it's only a matter of time. I figure librarians would be the central authority)
@TheIdOfAlan @wakame @richpuchalsky oh our personal theory is there should be no central authority because it would be a natural focus for hate activity aimed at co-opting it. we should work to ground everything out in our own social connections
@TheIdOfAlan @wakame @richpuchalsky we do already routinely describe trust status of information we share with friends, when it's on important topics, including our rationale so people can check our reasoning. it sucks to have to do that but it's where we are
@TheIdOfAlan @wakame @richpuchalsky (you don't see us doing that a lot on here because we try hard not to share things we aren't highly confident of, to avoid aiding disinformation)
@TheIdOfAlan @wakame @richpuchalsky (we see spam and disinformation as deeply entwined problems; it's because spam has gotten so strong that disinformation is able to get so much traction)

@ireneista @wakame @richpuchalsky I totally used the wrong term. I meant Certificate Authority instead of Central Authority.  

I wrote a reply, but it got long. And, while I don't think I'm going Reply Guy, I'm punting it over to my site to keep it off of main just in case. It's here if you're interested:

Paladins Librarians of The New Dark Age

https://www.alanwsmith.com/en/33/zG/je/Dd/?paladins-librarians-of-the-new-dark-age

Paladins Librarians of The New Dark Age

a post from alan w. smith

@TheIdOfAlan @wakame @richpuchalsky thanks! we love it when people put this level of thought into stuff, we'll give it a look in a few

Apparently, you read or at least skimmed 2,000 words and it got you nothing.

You summarize the overall experience

"we've had no trouble".

Interesting. I'd call that "life time waste".

@ireneista @wakame @richpuchalsky

@dj3ei @wakame @richpuchalsky yes, that was our point as well

@ireneista One place to look for this sort of thing is the 'rationale' sections of POSIX/Single Unix Specification, which is actually online (for now), eg https://pubs.opengroup.org/onlinepubs/9799919799/functions/chown.html

This is unhelpful for chown() because it points to V7 and V7 has what could be called 'a vague justification' in its manual page as I discovered in 2020¹ and then forgot until part-way through writing an earlier version of this toot.

¹ https://utcc.utoronto.ca/~cks/space/blog/unix/ChownDivideAndQuotas

chown

@cks it's online!!!! thanks! last time we dug into that, it wasn't. we'll pull a local copy for posterity

thank you! we did eventually satisfy ourselves as to this particular question

@ireneista They sort of try to make you at least register to get access to the online version, but people keep digging up the direct access URLs (and then search engines index them so you can usually manage to find at least some version, although perhaps an outdated one).

(Or maybe they've given up the registration attempts.)

@cks @ireneista Only the PDF rendering requires registration, while tarballs+zip of the HTML rendering are on https://pubs.opengroup.org/onlinepubs/9799919799/download/index.html (linked on the home so no digging needed)

The PDF rendering is sometimes slightly better but I got extremely used to navigating within the HTML one.
Download

@cks @ireneista i keep direct links to various POSIX editions on my ancient unmaintained bookmarks page https://dotat.at/bookmarks.html

the links have worked for decades (modulo a recent problem with redirects)

Tony Finch's bookmarks

@richpuchalsky
Yet another reason to abhor #AI in most of its forms.
@richpuchalsky El universo (que otros llaman la Biblioteca)...