Mastodawn

LC Wanderlust 5d ago

ploum

"You should really try Claude/WhateverLLM before criticizing"

is the new

"But it contains electrolytes"

Show thread

Virgile Andreani ⏚ 🇵🇸5d ago

@ploum To me, it is even worse than that. In their mind, there is no criticism that holds, and if you still criticize LLMs, it is because you haven't seen the light yet.
"I am never using LLMs because of ethical / philosophical / moral / environmental arguments" -> "you cannot have an opinion without trying at least try once"
"I asked ChatGPT something and it gave me a wrong answer" -> "you should use it more, to learn about good prompting"
"I asked a code question and its answer was riddled with bugs" -> "you should try an agent"
etc., ad nauseam. If you have criticism, it is only because you are not a believer yet. To me, it is extremely religion-like.

Show thread

pibert π 5d ago

@Armavica @ploum the creator of Claude made a papel of Reinforcemenf Learning trough Human Feedback. You guys doing “fix this, solve this, solve that” are just paying to make THEIR product better adjusting the model so it fits in more use cases.

Show thread

François 🇺🇦5d ago

@ploum

ce bon vieil Idiocracy !
on ne s'en lasse pas !!!
Par le même, à voir aussi: la série "Silicon Valley".
De mémoire: 6 saisons !

Show thread

Fabien Dupont 5d ago

@ploum "you read documentation on paper? like the paper in toilets?"

Better question:

How many neurons does it take to be "slop"?

I introduce a 3 neuron example. A PID loop, from control systems theory.

It has a training phase in which it 'learns' the control from known inputs. And it has a execution phase, of which it applies the learned inputs.

Even my 3d printers use PID for the nozzle and bed. My oven in the kitchen does so as well.

Is all learning software "evil"? If no, where's the cutoff?

Show thread

whiteshoulders 5d ago

@crankylinuxuser @ploum PID and gradient-descent optimized learning system are different in nature though. Putting PID on the same spectrum as LLM seems wrong. Or the definition of your spectrum is so broad that you could put any self-regulating system on it (like a water flush), making this spectrum near useless to describe/compare anything.

Show thread

Yet another Josh

5d ago

@whiteshoulders @ploum

Thats kind of the point.

Theres intermediate learning software like K-Nearest-Neighbors that also are trained on classified (properly annotated) data, and then can provide percentage responses on trained data. We see this with tools like Merlin birdsong identification.

I even made a 10 position classifier with the MYO myoelectrical armband back in 2016. No GPU needed. Modest CPU and ram was needed, something that easily a RPi 2 could do.

Point being is this whole debate is being forced into a binary with folks saying "this is amazing" to "horrific garbage". Maybe LLMs could be made more useful if they output % confidence and citations accurately?

But again, I'm not going to dismiss, nor am I going to trust everything. Both actions are foolish.

Show thread

whiteshoulders 5d ago

@crankylinuxuser @ploum you make this discussion a technical one, where it is political. A tool does not exists in a vacuum, it serves interests of some group, controling it means power etc...

The "slop" debate maybe would have not happened if big tech companies had not massively stolen content to train LLM. If they had not then sold them as "tools to help creativity" to the very same people it stole from.

It is always the same story :
- attempt at domination by big companies (yay capitalism)
- attempts to resist domination by some folks

Show thread

Yet another Josh

5d ago

@whiteshoulders @ploum

Well, it is both technological AND political.

Copyright is also highly political, and has changed many times from its inception. And how its implemented currently only favors the large entities (eg: disney etm).

I am not opposed in gobbling up as much content and training something with the sum of human knowledge. Now this thing needs to also be free.

And enter US vs Chinese AI. You can run DeepSeek R2 yourself. Same with Qwen. They are fully open source including the current state of the art. I'm even running a 30B Qwen at home on my desktop.

FLOSS is how we avoid the hypercapitalist bullshit, including the inevitable rug-pull by the US AI vendors. Anthropic already did that with "unlimited but not really" but didn't suffer.

But also, moving to "political problem" versus "technical problem" is also just moving the goalposts. The real political problem with all of these things is how structural capitalism further striates worker vs owner. Nedd Ludd discussed this centuries ago, as did Marx.

For the time being, yes, I will run my own AI on my equipment, under my control.

Show thread

whiteshoulders 5d ago

@crankylinuxuser @ploum you were answering to a post that made a political comment comparing the use of "slop" with a movie that is famously a humorous political commentary (and often used as such). I'm not moving any goalpost here.

You are free to use LLM if you like. The term "slop" is not technical : it does not describe the output of all LLMs (and what you said about PID was kinda demonstrating that). It describe a usage that some group find negative. If you use the tool ethically and correctly (whatever this means, we are still figuring it out), what you do with it is not slop (i think, but that is a personnal opinion).

Show thread

pibert π 5d ago

@crankylinuxuser @whiteshoulders @ploum

A SOTA GPU hour (H200) starts at $3.72 and ends at $10.60. Lets make $5 for simplicity. If the model spends like 3 minutes thinking(in a single prompt) you you get 0.25$ of money wasted (imagine no profit).

The people with Claude Code do like 50 prompts a day(being optimistic), and this is $12.5 of money.

Claude Max 20x is $200. If the guy prompts 28 days a month it means 350$ of GPU. -150$ in losses

Show thread

Yet another Josh

5d ago

@Pibert @whiteshoulders @ploum

The math doesn't pan out if you're using your own machines. Then its just cost of electricity.

Its what I'm doing. And it works pretty darn well. I can run thinking models, tool using models, image models. And data stays local.

Show thread

pibert π 4d ago

@crankylinuxuser @whiteshoulders @ploum

1/2
it’s the cost of electricity and hardware depreciation. I picked the lowest and highest price of GPU hour of the H200.

But like an poorly optimized 8 x H200 can generate 1 million tokens in 30 seconds being optimized.(deepseek v3)

If it’s like deepseek v3 costs $0.42 1M tokens. So it’s $50.4 an hour and you wasted $15.92 in GPU hour(Deep infra). Thinking in the API manner you get a ton of money.

Show thread