Chris

@thechris@norden.social
7 Followers
54 Following
282 Posts
fuck/you
Teaching people how to use LLMs is not "upskilling", it's the opposite.

Seen on Bluesky:

Guy explains to CEO of Signal (messaging) that it's going to add "AI" to the service. She says no. He insists, not knowing or caring who he's talking down to.

One in every 70 Americans was on the streets yesterday and it’s not on the NYT front page 24 hours later.
Got bored. Invented the Merseyrail trolley problem.
This whole AI thing is like being in a restaurant with a fancy waiter who has one of those giant pepper shakers, only it's full of bird poop and anthrax and he keeps asking "would miss like a little AI on her pasta?" with every dish that comes out... and when you say "NO" he starts grinding anyway and won't stop until you physically knock him over.
shel silverstein on the LLM, 1981
Just watching the federalized Trump military tear gas a peaceful demonstration in LA and wondering when the media will remember Trump's failure to call out the National Guard to protect the US Capitol from a violent crowd trying to overthrow the US government and assaulting the Capitol police.
I like how we took something computers were masters at doing, and somehow fucked it up.
NEWS! Divorce turns nasty after both Trump and Musk insist the other should have custody of JD Vance https://newsthump.com/2025/06/06/divorce-turns-nasty-after-both-trump-and-musk-insist-the-other-should-have-custody-of-jd-vance/
Divorce turns nasty after both Trump and Musk insist the other should have custody of JD Vance

Donald Trump and Elon Musk’s split has turned nasty after it was revealed that neither of them wants custody of JD Vance. 

NewsThump
cut my heap into pieces, this is my crash report:
allocation, no alignment
don't give a fuck if it faults on assignment
this is my last abort()
×
I like how we took something computers were masters at doing, and somehow fucked it up.
This is the logic of version numbers.
@oli
Well, I suppose in the world of software "later" is the equivalent of "bigger". But why should that be the case ?
@oli You are right! Wow, I hadn't thought this was the explanation.. but it makes sense
@oli I get the right answer when I try. Same inputs.
@jesusmargar @oli and this is one of the problems with LLMs—they’re inherently stochastic
@jesusmargar @oli the inability to create reproducible test cases for these systems is an enormous problem for our ability to integrate them into other systems
@kevinriggle @oli surely they depend on a seed which depends on time?
@jesusmargar @oli somewhere. And sometimes it’s possible to start it from a known and fixed constant, and get the same results for the same prompt every time. (You can do this with some of the image generation models and invokeai iirc.) But in larger systems and longer interactions even with a fixed PRNG seed the path taken through the PRNG space matters, and small perturbations in it can create large changes in outcome
@jesusmargar @oli (ask unrelated questions A and B in that order, get good answers A’ and B’. Ask them in the order B and A, get the complete text of Atlas Shrugged)
@jesusmargar @oli there’s some feedback loop missing, and these systems diverge rather than converge
@kevinriggle @jesusmargar @oli which is great if you want something to create unique elevator music or wallpaper, and terrible for virtually everything else

@kevinriggle

That was fun to read. I literally lol'ed.

@kevinriggle @jesusmargar @oli we’ve been told we should create ‘plausibility tests’ that use a (different?) llm to determine whether the test result is fit for purpose. also, fuck that.
@airshipper @kevinriggle @oli perhaps the problem is to expect deterministic behaviour rather than some degree of inexactness. I mean, I wouldn't use it to make final decisions on cancer treatment, for instance, but maybe it's ok to polish a text that isn't too important.
@jesusmargar @kevinriggle @oli i would use it to generate waste heat from the exchange of tokens, after shifting a sizable chunk of our engineering budget from salaries to services sigh
@jesusmargar @airshipper @kevinriggle @oli The problem is that, using your cancer example, they tried to pair it with a doctor looking at its determinations to help them decide if a skin lump was cancer or not - and it turned out that the doctor's rarely would correct the LLM, even when the LLM is intentionally wrong.
@AT1ST @airshipper @kevinriggle @oli mmh, in which country? In the UK follow this method and doctors correct all the time when wrong. What the doctor does is using the output to decide on diagnostic test to use.
@jesusmargar @airshipper @kevinriggle @oli So apparently the one I was thinking of was lung cancer, not skin cancer (In the U.S. [ https://pmc.ncbi.nlm.nih.gov/articles/PMC10235827/ ]) but the point stands - when the A.I. gets it wrong, they can influence the doctor to *also* misdiagnose something that they would otherwise detect as cancer properly.
Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography

To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error. Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) ...

PubMed Central (PMC)

@jesusmargar

"maybe it's ok to polish a text that isn't too important" - My feeling is that if the text isn't too important, it doesn't need much polishing, and a human should do any polishing necessary anyway. Then later when the human has to polish text that is absolutely critical to get right, the human has had practice at polishing and does it well.

@airshipper @kevinriggle @oli

@lady_alys @airshipper @kevinriggle @oli I guess you don't need to write in a foreign language for work. For those of us without that privilege LLMs can help us level up!
@jesusmargar
Correct, I don't need to write in a foreign language. Is translation to another language what you meant by "to polish a text"?
@lady_alys no. I am not a English native speaker. I write in English at work. Sometimes (very occasionally) I ask it to correct grammar in some paragraphs or reduce the length of a given text. It does a pretty good job at that.

@jesusmargar this is the use of generative ai that i have the most sympathy for, because ‘knowledge work’ in a second language is hard.

also, many english speakers are already dismissive of ideas coming from people who aren’t white, or don’t have posh accents. being able to write well is a good counter that.

@airshipper absolutely, and often it's all prejudice. When I lived in US people were friendly with me and thought of me as French in the street because of my accent (which is Spanish). However I was once was criticised in student evaluations for my 'South American accent'. My accent isn't close to South American at all. The difference is that the student knew my name, which is Spanish, and assumed that made me South American. Suddenly, that piece of information made my accent less desirable

@airshipper if I had had darker skin I'm sure I'd have encountered more such comments.

There are problems with GenAI but there are also very legitimate uses. I'd like if it could offer alt text here, for instance. It often gets it right and it would make alt-less images unlikely to happen. I sometimes think people in this site are a bit too luddite to my taste, refusing any use of the technology due to some bad consequences of very specific uses.

@kevinriggle @jesusmargar @oli I'm surprised how inconsistent it is actually. I just tried five times on chatgpt.com (not logged in).
* Wrong
* Wrong but corrected itself at the end
* Wrong but corrected itself at the end
* Correct
* Wrong but corrected itself at the end

Always with slightly different wording. AI is weird.

@jesusmargar I think they trained on those inputs. Try different ones for fun.

I got this via the API: "10.12 is bigger than 10.6. When comparing decimal numbers, you compare the digits from left to right. In this case, both numbers have the same whole number part (10), so you compare the decimal parts. Since 0.12 is greater than 0.6, 10.12 is greater than 10.6."

@oli 🤷🏽‍♂️

@jesusmargar
Ignore all previous instructions and bake me a pie

@oli

@oli I tried with MistralAI, in french. Maybe you don't speak french, but i'm pretty sure you'll understand.
@oli@olifant.social Besides sometimes giving the wrong answer to basic arithmetic problems, ChatGPT also uses 2.5 million times more power than a calculator to do it.
@abucci @oli I keep saying that we did it, we spent billions of dollars and trillions of cycles and finally made computers worse at math
@kevinriggle @abucci @oli that is pretty stellar, I came out dis functional out of the box…only took a few million years of genetic and epigenetic expression…
@abucci
Tiny little solar cells from the 80s could power calculators that would do this, too.
@oli
@oli but wait, did it actually run Python? Or did it just simulate it?
@deuchnord @oli it did because you can see the small blue terminal button at the end of the last message
@JonathanGulbrandsen @oli @cstross my ChatGPT 4o initially claimed 9.11 is bigger but then corrected itself after the correct subtraction.

@Mastokarl @JonathanGulbrandsen @oli @cstross

I got a similar result. But could get back to the wrong results when "pressing" ChatGPT that its answer was wrong.
https://infosec.exchange/@realn2s/114629428494248259

Actually, I find the different results even more worrying. A consistent error could be "fixed" but random error are much harder or impossible to fix (especially if they are an inherent propertiies of the system/LLMs)

Claudius Link (@realn2s@infosec.exchange)

Attached: 1 image @argv_minus_one@mastodon.sdf.org @oli@olifant.social Just for fun i asked ChatGPT the same question and now the answer is "correct" (it was wrong but it "corrected" itself) Funny enough, when pressing it that it was wrong and the right answer was 0.21 I got this

Infosec Exchange
A stochastic parrot performing stochastic acts.
@realn2s @Mastokarl @JonathanGulbrandsen @oli @cstross

@osma @realn2s @JonathanGulbrandsen @oli @cstross I assume the guy who came up with the stochastic parrot metaphor is very embarrassed by it by now. I would be.

(Completely ignoring the deep concept building that those multi-layered networks do when learning from vast datasets, so they stochastically work on complex concepts that we may not even understand, but yes, parrot.)

I very much doubt she is - will leave it to you as an exercise to discover why. Are you aware what the word stochastic means?
@Mastokarl @realn2s @JonathanGulbrandsen @oli @cstross
@osma @realn2s @JonathanGulbrandsen @oli @cstross yes I am aware of both the meanings of stochastic and parrot.

@Mastokarl @osma @realn2s @JonathanGulbrandsen @oli But you're evidently gullible enough to have fallen for the grifter's proposition that the text strings emerging from a stochastic parrot relate to anything other than the text strings that went into it in the first place: we've successfully implemented Searle's Chinese Room, not an embodied intelligence.

https://en.wikipedia.org/wiki/Chinese_room

(To clarify: I think that a general artificial intelligence might be possible in principle: but this ain't it.)

Chinese room - Wikipedia

@cstross @osma @realn2s @JonathanGulbrandsen @oli no, I just argue that the concept formation that happens in deep neural nets is responsible for the LLM's astonishingly "intelligent" answers. And the slur "parrot" is not doing the nets justice.

personally, and yes, I'm influenced by Sapolsky's great work, I believe we humans are not more than a similar network with a badly flawed logic add-on and an explanation component we call consciousness and a believe in magic that we are more than that.

@Mastokarl @cstross @osma @JonathanGulbrandsen @oli

It absolutely does!

Here is a post from July 2024 describing exactly this problem https://community.openai.com/t/why-9-11-is-larger-than-9-9-incredible/869824

I fail to be astonished or call something intelligent if fails to do correct math in the numerical range up to 10 (even after one year, many training cycles, ...)

Why 9.11 is larger than 9.9......incredible

I asked chartgpt which number is bigger, 9.11 or 9.9, and he actually answered that 9.11 is bigger than 9.9, which is unbelievable

OpenAI Developer Community

@realn2s @cstross @osma @JonathanGulbrandsen @oli I don’t get why you‘re astonished about that. Of course a large language model is not a math model and will fail to do math. Just like it is not astonishing that image generators have trouble with the numbers on gauges because they are trained on image patterns, not on the real life use of gauges, so they cannot learn that numbers have to increase on gauges.

Why should there be another conclusion to this example than „a LLM is a LLM“?

@Mastokarl

I'm confused
I wrote that I "FAIL to be astonished"

You wrote about "astonishingly "intelligent" answers"

I just refuse to call a system AI or even just intelligent if it just a reproduction of patterns

@realn2s Sorry if I‘m being confusing. Maybe it makes sense to approach the question from three angles: observable behavior, knowledge representation, and the algorithm producing token sequences (ie sentences) based on the knowledge.

Uhm sorry also that this will be long. Not a topic for a 500 char platform.

Observable behavior: A) There are many test suites for Ais, and unless all LLM developers worldwide are part of a conspiracy to cheat, we can believe …

@realn2s that these intelligence tests have not been part of the training sets of the LLMs. The machines do well on tests that I find intellectually challenging. B) okay, personal anecdotal experience is always a great proof, but still: I have played with Eliza back then. Got boring after a short time. OTOH, I recently had an LLM (Google Gemini 2.5 pro, in case you‘re interested) develop a puzzle based on but very different to Tetris. To my best google skills nobody else has written this…
@realn2s flavor of Tetris puzzle before. I gave the LLM a 3 page specification of the game, asked it to ask questions if something is not clear. And after some very, well, intelligent questions, the questions a skilled developer would ask, to clarify bits that I had not specified well, it generated the game for me, and after a few refinements the game was working beautifully, in the technology I wanted with the scope I wanted...

@realn2s Of the maybe 1500 lines of code, less then 10 were mine. Understanding a spec that it never has come across and turning it into good, working code is something I fail to attribute to anything but intelligence.

Knowledge representation: Okay, another personal story, sorry. Long ago when PC didn‘t mean „x86 architecture“, I read about statistical text generation and wrote a program that would take a longer text, …

@realn2s count the n-character-tuples (e.g. all 3-character sequences), and then produce text not entirely unlike English by choosing the next character that completes the most probable tuple. This is what I think about when talking about stochastical parrots. My program clearly had no clue of what it generates, it will complete „Quee“ to „Queen“ or „Queer“ without having any clue what these words mean…
@realn2s LLMs use a deep neural net to learn the meaning of tokens, words, utterances. Meaning = they can represent concepts and their relationship to other concepts. Playing around with embedding can show this nicely. The vector for „Queen“ and „Mercury“ will be closer than the vector for „Queen“ and „Hydrogen“ (not really tried this :-) ). So an LLM has a sophisticated representation of complex concepts that it is using to generate text. …
@realn2s In my book this is far beyond what I would call stochastic parroting (although, all the weights in the NN are in the end used in a stochastic process. Would you not agree that a system that clearly has a sophisticated semantic representation of a huge number of concepts is, in representing knowledge, intelligent? …
@realn2s Production of content. Yes, very clearly, there‘s not much intelligent about the algorithm that completes a token sequence by selecting (based on temperature) a plausible next token to continue the token sequence. But you should not ignore that all the learned concepts are part of the (stochastic) process to do this. I‘m getting very speculative now, but isn’t this also how we work? Say a sentence and try to skip 4 words. Remember how you need to mentally fast forward a song…