Mastodawn

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

Lemmy

One of the described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.