Honestly, the thing that will probably kill LLMs the hardest is someone writing a small language model that fits in JavaScript in a browser and hits comparable benchmarks.

Why bother with all those GPUs and energy usage if your Raspberri Pi could get comparable results?

Is this possible? I dunno. I'm not specialized in this.

But if I wanted to fuck the GenAI bubble over and had the relevant background experience? This is what I'd explore.

@soatok If you want it just to be able to use language, sure. But they want a vastly overfitted model that lossily compresses the volume of human writing and can spit back out obfuscated plagiarism of arbitrary parts.

@dalias One model per language.

Want it to generate C? Download the C model.

Want it to write bad poetry? Download the Vogon I mean English model.

@soatok Right but that's not all they want. They want it to generate obfuscated plagiarism of poetry. They want it to generate "copyright-free" copies of arbitrary FOSS programs, songs, etc. This inherently requires the largeness of the model because the plagiarism is buried in the overfitting.
@soatok If you had to give it the things you wanted copied as explicit input, the plagiarism and copyright infringement would be obvious to users and courts. Making it gigantic ambient state obfuscated in the model is how they get away with it.

@dalias @soatok we agree that this is a thing these companies want, in the present day, now that they've seen the potential for theft-at-scale

we don't think it's the line of reasoning that brought us here

@dalias @soatok we think the original motivation was the usual large-company thing of starting from the conclusion they want to be true, then pretending like it is.

it would have gone like this: for large companies to dominate this market, there has to be something they can do that small companies can't. what is that? spend more money on training it.

@dalias @soatok our main reason for thinking about this is that our friends at DAIR who were part of Google's ML Fairness team have spoken publicly about the company's (lack of) reasoning for increasing model sizes
@ireneista @soatok I think it's correct that they didn't originally set out to make plagiarism machines, but they did set out to fake intelligence. Ability to use language doesn't do that. Their approach depended on assimilating a gigantic corpus of ambient "knowledge" in a form where it's not actually usable as knowledge but can be convincingly regurgitated in different permutations to look like the machine understands it.
@dalias @soatok yes, we agree with that. there was a hope at first that there might be other pieces of the architecture, but nobody has made that work, yet.
@lritter @soatok "They" being the AI enthusiasts or people who feel like they're getting something of value from "AI". They almost surely don't frame what they want to themselves or others in terms of the plagiarism, but it'd be useless to them without that outcome.

@dalias @soatok i don't mean to be the bearer of bad news but if these systems would only plagiarize it would certainly be easier to dismiss em. they also inter- and extrapolate, which, if a human had done it, would be counted as original work. of course that's not all that creative work is, and so it's quite limited.

overfitting is undesirable, because it turns a fuzzy database into a regular database - then it's not plagiarism, it's a straight up copyright violation.

@lritter @soatok Um, no, you're not being the "bearer of bad news". AI propagandism isn't news. An interpolation of existing works to cover up that it's essentially the same as someone else's work is plagiarism if a human does it too. This is why the early architects of free software always insisted on clean room reimplementations based on a specification someone else had worked out, not reading reverse engineered or leaked proprietary code then pretending they could forget it and write something equivalent.
@dalias @soatok i see you have your mind made up. but what i said isn't AI propagandism - just facts. i find the implication insulting, tbh. but you're in war mode, i get it. happy hunting.
@dalias @soatok what i want is definitely not that. what i want is something like siri, except not total ass.
@ariadne @soatok I mean they the AI enthusiasts.