"One example cited by the team involved a modified image of a traffic light. While the image appeared ordinary to human viewers, it reportedly influenced the model to provide instructions for running a red light while avoiding a traffic ticket, information the system would normally refuse to provide."
Old: use prompting to get behind the "guardrails"
New: use images












