Mastodawn

@Mastokarl @JonathanGulbrandsen @oli @cstross

I got a similar result. But could get back to the wrong results when "pressing" ChatGPT that its answer was wrong.
https://infosec.exchange/@realn2s/114629428494248259

Actually, I find the different results even more worrying. A consistent error could be "fixed" but random error are much harder or impossible to fix (especially if they are an inherent propertiies of the system/LLMs)

Claudius Link (@realn2s@infosec.exchange)

Attached: 1 image @argv_minus_one@mastodon.sdf.org @oli@olifant.social Just for fun i asked ChatGPT the same question and now the answer is "correct" (it was wrong but it "corrected" itself) Funny enough, when pressing it that it was wrong and the right answer was 0.21 I got this

Infosec Exchange

Show thread

Osma A 🇫🇮🇺🇦Jun 5

A stochastic parrot performing stochastic acts.
@realn2s @Mastokarl @JonathanGulbrandsen @oli @cstross

Show thread

Mastokarl 🇺🇦Jun 5

@osma @realn2s @JonathanGulbrandsen @oli @cstross I assume the guy who came up with the stochastic parrot metaphor is very embarrassed by it by now. I would be.

(Completely ignoring the deep concept building that those multi-layered networks do when learning from vast datasets, so they stochastically work on complex concepts that we may not even understand, but yes, parrot.)

Show thread

Osma A 🇫🇮🇺🇦Jun 5

I very much doubt she is - will leave it to you as an exercise to discover why. Are you aware what the word stochastic means?
@Mastokarl @realn2s @JonathanGulbrandsen @oli @cstross

Show thread

Mastokarl 🇺🇦Jun 5

@osma @realn2s @JonathanGulbrandsen @oli @cstross yes I am aware of both the meanings of stochastic and parrot.

Show thread

Charlie Stross Jun 5

@Mastokarl @osma @realn2s @JonathanGulbrandsen @oli But you're evidently gullible enough to have fallen for the grifter's proposition that the text strings emerging from a stochastic parrot relate to anything other than the text strings that went into it in the first place: we've successfully implemented Searle's Chinese Room, not an embodied intelligence.

https://en.wikipedia.org/wiki/Chinese_room

(To clarify: I think that a general artificial intelligence might be possible in principle: but this ain't it.)

Chinese room - Wikipedia

Show thread

Mastokarl 🇺🇦Jun 5

@cstross @osma @realn2s @JonathanGulbrandsen @oli no, I just argue that the concept formation that happens in deep neural nets is responsible for the LLM's astonishingly "intelligent" answers. And the slur "parrot" is not doing the nets justice.

personally, and yes, I'm influenced by Sapolsky's great work, I believe we humans are not more than a similar network with a badly flawed logic add-on and an explanation component we call consciousness and a believe in magic that we are more than that.

Show thread

Claudius Link Jun 5

@Mastokarl @cstross @osma @JonathanGulbrandsen @oli

It absolutely does!

Here is a post from July 2024 describing exactly this problem https://community.openai.com/t/why-9-11-is-larger-than-9-9-incredible/869824

I fail to be astonished or call something intelligent if fails to do correct math in the numerical range up to 10 (even after one year, many training cycles, ...)

Why 9.11 is larger than 9.9......incredible

I asked chartgpt which number is bigger, 9.11 or 9.9, and he actually answered that 9.11 is bigger than 9.9, which is unbelievable

OpenAI Developer Community

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s @cstross @osma @JonathanGulbrandsen @oli I don’t get why you‘re astonished about that. Of course a large language model is not a math model and will fail to do math. Just like it is not astonishing that image generators have trouble with the numbers on gauges because they are trained on image patterns, not on the real life use of gauges, so they cannot learn that numbers have to increase on gauges.

Why should there be another conclusion to this example than „a LLM is a LLM“?

Show thread

Claudius Link Jun 6

@Mastokarl

I'm confused
I wrote that I "FAIL to be astonished"

You wrote about "astonishingly "intelligent" answers"

I just refuse to call a system AI or even just intelligent if it just a reproduction of patterns

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s Sorry if I‘m being confusing. Maybe it makes sense to approach the question from three angles: observable behavior, knowledge representation, and the algorithm producing token sequences (ie sentences) based on the knowledge.

Uhm sorry also that this will be long. Not a topic for a 500 char platform.

Observable behavior: A) There are many test suites for Ais, and unless all LLM developers worldwide are part of a conspiracy to cheat, we can believe …

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s that these intelligence tests have not been part of the training sets of the LLMs. The machines do well on tests that I find intellectually challenging. B) okay, personal anecdotal experience is always a great proof, but still: I have played with Eliza back then. Got boring after a short time. OTOH, I recently had an LLM (Google Gemini 2.5 pro, in case you‘re interested) develop a puzzle based on but very different to Tetris. To my best google skills nobody else has written this…

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s flavor of Tetris puzzle before. I gave the LLM a 3 page specification of the game, asked it to ask questions if something is not clear. And after some very, well, intelligent questions, the questions a skilled developer would ask, to clarify bits that I had not specified well, it generated the game for me, and after a few refinements the game was working beautifully, in the technology I wanted with the scope I wanted...

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s Of the maybe 1500 lines of code, less then 10 were mine. Understanding a spec that it never has come across and turning it into good, working code is something I fail to attribute to anything but intelligence.

Knowledge representation: Okay, another personal story, sorry. Long ago when PC didn‘t mean „x86 architecture“, I read about statistical text generation and wrote a program that would take a longer text, …

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s count the n-character-tuples (e.g. all 3-character sequences), and then produce text not entirely unlike English by choosing the next character that completes the most probable tuple. This is what I think about when talking about stochastical parrots. My program clearly had no clue of what it generates, it will complete „Quee“ to „Queen“ or „Queer“ without having any clue what these words mean…

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s LLMs use a deep neural net to learn the meaning of tokens, words, utterances. Meaning = they can represent concepts and their relationship to other concepts. Playing around with embedding can show this nicely. The vector for „Queen“ and „Mercury“ will be closer than the vector for „Queen“ and „Hydrogen“ (not really tried this :-) ). So an LLM has a sophisticated representation of complex concepts that it is using to generate text. …

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s In my book this is far beyond what I would call stochastic parroting (although, all the weights in the NN are in the end used in a stochastic process. Would you not agree that a system that clearly has a sophisticated semantic representation of a huge number of concepts is, in representing knowledge, intelligent? …

Show thread

Mastokarl 🇺🇦Jun 6

@realn2s Production of content. Yes, very clearly, there‘s not much intelligent about the algorithm that completes a token sequence by selecting (based on temperature) a plausible next token to continue the token sequence. But you should not ignore that all the learned concepts are part of the (stochastic) process to do this. I‘m getting very speculative now, but isn’t this also how we work? Say a sentence and try to skip 4 words. Remember how you need to mentally fast forward a song…