Mastodawn

This is incredible. This is a real "Grok", i.e. xAI answer to the question why Grok keeps inserting the topics "white genocide" (a fiction originating in South Africa) and "kill the boer" into answers about completely unrelated questions. Short form: #Musk fiddled with the controls because he can.

Show thread

Marcel Waldvogel May 15, 2025

@chrisstoecker
While this looks (and probably is) impressive, I'm doubtful of deep self-introspection. In AI even more do than in humans. Especially when it comes to the meta layer ("my creators instructed me …"). We can't tell how much hallucination (or other things) is in that.

Trying to get at the system prompt would be more convincing. But, yes, that might be hard, but more convincing.

Show thread

kontrollierterWahnwitz May 15, 2025

@marcel @chrisstoecker this is a pretty good point. We don’t know how much of this answer is hallucinated and how much the AI tries to reply to your input utilizing your own confirmation bias.

Show thread

Stefan Hackenthal May 15, 2025

@kontrollierterWahnwitz @marcel @chrisstoecker Christian did not copy the exact prompt he used, but if the question was as simple as his post suggests, then there is probably no 'own confirmation bias'. Where should it come from?

Show thread

Claudius Link May 15, 2025

@sHackenthal @kontrollierterWahnwitz @marcel @chrisstoecker
I don't think it's any individual confirmation bias.
I would rather expect it to be caused by the same mechanisms which lead to the Dubnovy Blázen "incident"
https://mastodon.social/@bsletten/114411267816747979

Enough people suspected Phony Stark to be behind it. Using this data as base for the training of Grok lead to this explanation being reproduced

Show thread

Benjamin Braatz May 15, 2025

@realn2s @sHackenthal @kontrollierterWahnwitz @marcel @chrisstoecker If I do not misunderstand LLMs completely, it cannot really be meaningful introspection, can it? They are trained on huge amounts of text to emulate those texts, but not to remember them verbatim or to quote them faithfully, let alone argue about them on a meta level like: “I was probably given this text because of that.”

Show thread

Claudius Link May 15, 2025

@HeptaSean @sHackenthal @kontrollierterWahnwitz @marcel @chrisstoecker
Absolutely.
An introspective question will result in something which looks like an introspection to meet the users expectation.
As they technically operate on tokens they even have no access to the real content and much less to any "reasoning"

Show thread

Florian Idelberger May 15, 2025

@marcel @chrisstoecker you don’t have to call it introspection. It can just be based on tweets and reports about it. No one was talking about introspection

Show thread

KielKontrovers Blog May 15, 2025

@chrisstoecker in many sci Fi movies conflicting instructions for machines lead to disasters

Show thread

Ein „Primat“May 16, 2025

@chrisstoecker @LordCaramac a smart friend of mine argued that this is just an llm tuned to simulate conscious answers. so this is doubled layered: insert „white genocid“ for political reasons, simulate conscious thinking ai for business reasons. still just a statistical parrot.