Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

https://sh.itjust.works/post/32659976

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis - sh.itjust.works

Lemmy

One of the described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.