Mastodawn

I see a lot of blank, outright rejection of #AI, LLMs general or coding LLMs like #ClaudeCode in special here on the Fediverse.
Often, the actual impact of the AI / #LLM in use is not even understood by those criticizing it, at times leading to tantrums about AI where there is....no AI involved.

The technology (LLM et al) in itself is not likely to go away for a few more years. The smaller #ML variations that aren't being yapped about as much are going to remain here as they have been for the past decades.
I assume that what will indeed happen is a move from centralized cloud models to on-prem hardware as the hardware becomes more powerful and the models more efficient. Think migration from the large mainframes to the desktop PCs. We're seeing a start of this with devices such as the ASUS Ascent #GX10 / #Nvidia #GB10.

Imagine having the power of #Claude under your desk, powered for free by #solar cells on your roof with some nice solar powered AC to go with it.

Would it not be wise to accept the reality of the existence of this technology and find out how this can be used in a good way that would improve lives? And how smart, small regulation can be built and enforced that balances innovation and risks to get closer to #startrek(tm)?

Low-key reminds me of the Maschinenstürmer of past times...

Show thread

yoasif Dec 12

@jti42 This is really a straw man - people realize that LLMs are built using largely stolen data. When the conversation starts with theft, it is hard to be constructive about the technology.

Show thread

JTI Dec 13

@yoasif Many of the current frontier LLM models have indeed been built in doubtable ways from an intellectual property perspective. This is nevertheless not an inherent property of the technology LLM.
Image generation models built from fully licensed training sets exist. At different price points, but they do exist.

Given their popularity the publicly perceived utility of the frontier models created in criticizable ways is hard to negate. Open weight model or commercially sold one.
Damage is done, perceived utility exists either way.

We're not going to resolve this issue here, but consider this hypothetical idea:

Assume a list of IP that a commercial frontier model of the doubtable class has consumed could be created.
Now, let's say based on the size/complexity/other measure of the IP consumed every holder of the rights to said IP is awarded a monthly royalty payment based on the profits made with said model until provable retirement of said commercial model.Would this change your stance?

Or consider a future build of such models where IP holders can submit their IP for inclusion for similar considerations, generating an opt-in model.

How would the sentiment change if such a model would be fully open, i.e. completely reproducible (within the boundaries of the nondeterministic nature of LLM training) with training data, harness, etc. and licensed in an OSI approved way?

What if such a model would be capable of attributing its output to what it referred to in the proper license compliant way? (Not possible with the current tech likely, but we're playing hypothetical games anyway...)

Show thread

yoasif Dec 15

@jti42 I had meant to respond earlier, apologies.

The more I think about it, the less sense I think it makes for people to give up their outputs to LLMs for anything beyond a search engine.

The creative part of the work isn't the output, and performing the output of creativity isn't creative.

Why are we so quick to give up our thoughts? But at the end of it, the LLM doesn't have our thoughts, only the result of them.

Show thread

Epic Null Dec 15

@jti42 @yoasif

Assume a list of IP that a commercial frontier model of the doubtable class has consumed could be created.

A good start.

Now, let's say based on the size/complexity/other measure of the IP consumed every holder of the rights to said IP is awarded a monthly royalty payment based on the profits made with said model until provable retirement of said commercial model.Would this change your stance?

I have two problems here.

First: consent. Given the rights holder was never asked, and never (in an informed way) agreed to have their work treated like this, we should not normallize, encourage, or allow the misuse to just continue.

Second: The power of evaluation is completely in the wrong court. The value should be set by the rights holders.

This plan turns theft into a valid way of doing buisness, and the companies that did these things should be forced to take nasty losses to prevent the penalties from becoming "the cost of doing buisness".

===

No problems with an opt-in solution though. That much is perfectly reasonable and how things should have been from the beginning.

Show thread

Epic Null Dec 15

@jti42 @yoasif

Given their popularity the publicly perceived utility of the frontier models created in criticizable ways is hard to negate. Open weight model or commercially sold one.

Correct.

Damage is done, perceived utility exists either way.

This I deeply disagree on.

Yes, open weight models make it difficult to fully go back. No, the harm cannot be fully undone. BUT.

If today, right now, this moment, every AI company just stopped scraping the web, small sites would be able to relax again. News sources would not need to have such tight paywalls. Anti-bot measures could be relaxed.

If we made companies responsible for their support chatbot's promises, they would bring back support staff and customers would be happier.

If they stopped developing DeepFake technology, the tech would develop slower and detection could get ahead of it.

The damage is not over. And a lot of it can still be undone.

But first there MUST be accountability.

Show thread

Epic Null Dec 12

@jti42 I am gonna skip what you said and get right to the logical falacy at play here.

You are flipping the burden of proof.

It's the pro-AI people who need to " find out how this can be used in a good way that would improve lives" and promote that evidence. Not the anti-AI people.

Regarding:

"We're seeing a start of this with devices such as the ASUS Ascent GX10 / Nvidia GB10."

I think the AMD Ryzen Al Max+ 395 is also attractive to those who want to run some of the open-source / open-weight AI models at home.

Show thread

JTI Dec 17

@reiver Yeah, I think I saw an article the other day presenting an UMA architecture mini box with that chip and 128G of RAM. Talked about ~2xxW of full load power draw for about $2-3K for the box.
Would be interesting to see soneone do a comparison with the frontier/central models and amortization time with their and the boxes price.

I also find the return of the UMA architecture funny that was dismissed as "meh, slow" and "for cheap consumer notebooks only".

Show thread

KM6ECC Dec 17

@jti42 Gemma3:latest (Ollama) runs kind of OK in CPU mode on a 8gb ram 4 core mini PC. Decent models are already running on potatoes!

Show thread

JTI Dec 17

@km6ecc Hehe, what purposes of LLM use did you successfully achieve with that amount of hardware?
I did see success on 64G and have been reported success on 128G, CPU-based for some summarization and classification tasks on larger corpora of text.

Show thread

KM6ECC Dec 18

@jti42 some coding tasks (don't worry I don't use said code), some vision (honestly, even gemma3:270m is surprisingly good at this!), some summarization and some general q&a. It is not fast, but is impressively passable especially for a potato. Point is, quantization is still scaling nicely while training larger models has hit a ceiling. And I have to tell you all, tinyllama is a hilarious clown that runs on 4gb raspberry pi. You know, if laughs are your jam #meta got your back.

Show thread

KM6ECC Dec 18

@jti42 https://www.ft.com/content/353adf48-a5d6-4ea0-b5d7-9a94b141213a

Client Challenge