Mastodawn

LegendaryBjork9972

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

https://sh.itjust.works/post/32659976

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis - sh.itjust.works

Lemmy

Show thread

meyotch Feb 12, 2025

My own research has made a similar finding. When I am taking the piss and being a random jerk to a chatbot, the bot much more frequently violates their own terms of service. Introducing non-sequitur topics after a few rounds really seems to ‘confuse’ them.

Show thread

Cornpop Feb 12, 2025

This is so stupid. You shouldn’t have to “jailbreak” these systems. The information is already out there with a google search.

Show thread

A_A Feb 12, 2025

One of the described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.