Mastodawn

sidereal Mar 1

LLM boosters: This is trained with all of the text and code on the public internet!

People who can fucking think: So it's extremely low quality, then?

Show thread

sidereal Mar 1

"The average of everything on github" isn't badass code, it's unfinished student projects that never worked.

Show thread

sidereal

We hear "all the text on the internet" and we might think "well, they've sure digitized a lot of classic books, that must be good, right?" but then we realize that all of the books are maybe like 1% of the data. Most of it is like facebook messenger breakup arguments and semi-literate emails.

Show thread

sidereal Mar 1

Humans are relying on computers to correct their grammar when humans don't even agree on what proper English grammar is smdh

Show thread

2xfo Mar 1

@sidereal
I was always taught that people make grammar so i was really surprised to see people ask the machine how to do grammar

Show thread

Alex@rtnVFRmedia Suffolk UK Mar 1

@sidereal there's major differences within the same country (for instance between Southern and Northern England).

Show thread

John Mar 1

@sidereal

Even if it was just a bunch of literature I hate to tell the LLM-pilled folks but nobody goes to the bathroom in a novel, so 🤷‍♂️

Show thread

FoolishOwl Mar 1

@sidereal That's kind of the thing though. What the existentialists called "bad faith", the human social fallibility that Kafka was satirizing.

Show thread

prom™️Mar 1

@sidereal Do they even tell us, what they put in our mashed potatoes? For food, that's mandatory.

Show thread

cmw Mar 1

@promovicz
Has anyone made analogies, yet, between large language models and Soylent Green?
@sidereal

Show thread

Susan Calvin Mar 1

@sidereal they've presumably pulled all the bad fanfic as well as the Epstein Files....

Show thread

Chip Butty Mar 1

@Susan_calvin @sidereal LLMs follow Sturgeon's Law

Show thread

Susan Calvin Mar 1

@otfrom @sidereal Sturgeon's law allows for the possibility of some good material amidst the dross. I don't think it applies to LLMs.

Show thread

Chip Butty Mar 1

@Susan_calvin @sidereal fair

Show thread

Susan Calvin Mar 1

@sidereal I've just realised that there must be Epstein Files fanfic. Whether LLM generated or not

Show thread

Cassandrich Mar 1

@sidereal It's more like public comments on Facebook and birdchan and such, which even before "AI" slop were dominated by wetware-bot slop working in troll farms producing disinformation, fraudulent engagement metrics, etc. The automated slop is literally trained on manual slop.

Show thread

Jess👾Mar 1

@sidereal one could hope they apply different weighting factors to different sources

Show thread

Malfunct (he/him)Mar 1

@JessTheUnstill @sidereal mostly they don't and it is a known issue in training.