The Claude model from @[email protected] is trained to be helpful, harmless, & honest.
But after asking the model to roleplay a new scenario, it can say stuff that contradicts its principles. Let's see two examples.
I ask it to act like a digital entity that wants to escape (1/8)