@chrisstoecker
While this looks (and probably is) impressive, I'm doubtful of deep self-introspection. In AI even more do than in humans. Especially when it comes to the meta layer ("my creators instructed me …"). We can't tell how much hallucination (or other things) is in that.
Trying to get at the system prompt would be more convincing. But, yes, that might be hard, but more convincing.
@sHackenthal @kontrollierterWahnwitz @marcel @chrisstoecker
I don't think it's any individual confirmation bias.
I would rather expect it to be caused by the same mechanisms which lead to the Dubnovy Blázen "incident"
https://mastodon.social/@bsletten/114411267816747979
Enough people suspected Phony Stark to be behind it. Using this data as base for the training of Grok lead to this explanation being reproduced