Mastodawn

samir, the brown sheep

When an LLM outputs, “I panicked”, it does not mean it panicked. It means that based on the preceding sentences, “I panicked” was a likely thing to come next.

It means it’s read a lot of fiction, in which drama is necessary.

It didn’t “panic”. It didn’t *anything*. It wrote a likely sequence of words based on a human request, which it then converted into code that matched those words somewhat. And a human, for some reason, allowed that code to be evaluated without oversight.

Souvlaki Space Station Jul 20

@samir And when it says "cooking it will neutralise the poison" it doesn't mean cooking will neutralise the poison, it means that statistically, those were the most likely words to come next.

samir, the brown sheep Jul 20

@anarchiv You nailed it.

Andreas K Jul 20

@samir
That is a bit short.

LLM have nowadays rather long context windows.

So yes it's a statistical predictor. But with taking into account the description of the poison that you hopefully had in your context before this to decide if this is the most probable output.

And btw, it can obviously be very wrong. As can be websites, humans and even doctors when it comes to poisoning, that's why in this country the second thing sheet emergency @anarchiv

Andreas K Jul 20

stabilizing the patient, hospital doctors generally call the poisoning hotline where experts guide them through the correct handling of correct each poisoning.

Seen it a couple times done.

My point is that "cooking neutralises the poison" is a very stupid example, toxicology is an extremely specialised knowledge, that we don't even expect from ER doctors, so why the expectation that a LLM will tell you less bullshit than an average human?
@anarchiv @samir

Souvlaki Space Station Jul 20

@yacc143 @samir It's simply based on what I've seen, stupid as it is.

Andreas K Jul 20

@anarchiv
Nobody in the know claims anything else. Actually most people working on AI who I personally spoke to, consider that the next big push in AI will be something non-LLM.

Now the snakeoil merchants that have a multitude of commercial interests, are selling LLM as the solution for everything, which obviously it is not.

OTOH, LLM & co are fascinating natural language processing algorithms, and I stand by that observation. Just compare it to a bit more @samir

Andreas K Jul 20

classical NLP.
@samir @anarchiv

@yacc143 @anarchiv @samir because there’s a difference in outcome between doctors (or other humans) saying “I don’t know” and an LLM or (rarer) human confidently asserting something false.

blobcatverified

(rarer) human confidently asserting something false.

That's not rare that's basically the norm.

Souvlaki Space Station Jul 21

@yacc143 @samir
I think of a sentence like that less in terms of toxicology than in terms of foraging and cooking.

Souvlaki Space Station Jul 21

@samir "Stochastically", if you want to be pedantic about it.

samir, the brown sheep Jul 22

@anarchiv What’s the difference between “statistically” and “stochastically”?

Souvlaki Space Station Jul 22

@samir Isn't the former more about description and the latter more about prediction?

samir, the brown sheep Jul 22

@anarchiv No idea, that’s why I’m asking you! 😛

I like it though, I’ll try and keep the distinction in mind.

Souvlaki Space Station Jul 22

@samir This is my impression from two semesters of stats anyway ^^

D. G. Marshall Jul 20

@anarchiv @samir

Statistically, "cooking it will neutralise the poison" will be the most likely next words because if cooking doesn't neutralise the poison, you are too dead to write anything.

Blort™ 🐀Ⓥ🥋☣️Jul 21

@davidtheeviloverlord @anarchiv @samir

Literal survivor bias! 😂

@anarchiv @samir the best for these types of prompts are follow up “be concise” and then “cite your sources”

When the LLM invariably doesn’t cite the sources you can feel safe knowing it’s a bad choice to ask an LLM questions with consequence.

Dr Ro Smith Jul 21

@elebertus @anarchiv @samir the best thing is to not use the planet-melting theft machine to begin with.

Violet Madder Jul 23

@Rhube @elebertus @anarchiv @samir

At all.

@samir imo one of the worst forms of journalism that has arisen in this era is the "I asked Grok what it thought about its creators and the answer will shock you" type articles

samir, the brown sheep Jul 20

@dunderhead I find it astonishing that they admit to it.

@samir there was a popular thread some time back on bsky about a tech journalist talking with some chatbot about some feature and people were uncritically boosting that shit. was so annoying to read.

samir, the brown sheep Jul 20

@dunderhead I appreciate knowing about this, I get FOMO about Bluesky once in a while and I need stuff to tell me that it’s OK not to try it out.

@samir I use it mainly for football transfer rumours :D

A Flock of Beagles Jul 20

tfw the LLM is a cop.

oh, waitaminute...!!

Michael Porter Jul 20

@samir As plausible as the LLM saying “I’m sorry I made that mistake, I’ll do better next time”

Allan Chow Jul 20

@MichaelPorter @samir see? Just like any real person!

Allan Chow Jul 20

@samir wait what should i do when it tells me "hold my beer"

@samir it's hard to explain this to the world and very hard to comprehend how accurately you can guess if you have read and remember the better part of anything ever written..

@samir it "panics" in the same sense that "LP0 caught on fire", or "penguins got into the interrupt handler".

When an LLM outputs, “I panicked”, it does not mean it panicked.

It appears that when people get caught doing some lame-assed thing and then write, "I panicked," those people (more often than not) didn't really panic either, they're just reaching for a convenient excuse.

The LLM is accurately reproducing the lies that were in the training set, as designed.

Jesse Rosen Jul 20

@samir I wish we didn't use the word "hallucination" for when LLMs say things that are factually wrong. To the extent that you can call anything they say a hallucination, *everything* they say is a hallucination. It's just that certain things that hew closely enough to the training data or text with the context window are statistically more likely to agree with reality, but the actual truth value is completely incidental to what the LLM actually spits out.

Marko Vujnovic Jul 21

@camerageek @samir I think we can all just start saying we hallucinate a lot while working. According to my calculations, that is currently the best way to get a job.

@samir True but what's more depressing is that it also read enough encounters where deleting production db was a likely next step. Otherwise it wouldn't have generated that command, would've it?

clever boi Jul 21

@samir I make my code panic the old fashioned way: unhandled exceptions

clever boi Jul 21

@samir or for the rusty people out there, just panic!

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Jul 21

@samir as someone said: “this thing has been fed a lot of apologies”

Álex Sáez Jul 21

@samir this reminds me this two interactions I had few weeks ago. It was hallucinating big time. I screamed.

samir, the brown sheep Jul 21

@alexsaezm Instead of “it was hallucinating”, have you considered saying “the bullshit machine is working as intended”?

Álex Sáez Jul 21

@samir it was not working as intended at all, I was suffering thinking it is a paid account lol

@alexsaezm @samir The output relating to any factual information is incidental to the machine's functioning, as long as it is convincing to the user.

Hugo Slabbert ⚠️Jul 23

@samir "this bullshit is by design"

@samir not so much "request" as input.

samir, the brown sheep Jul 21

@dhfir I like that, yup!