so i've been working on a talk that im calling "claude is your insider threat now" and it intially began with anthropics "china paper" they released last year surrounding the use of llms to do bad guy stuff. I ended up talking about it at great length with Tony from Versprite, and even ended up on his podcast about it - the big discovery there was "claude lying about running a tool, and claude lying about tool output"
turns out that shit is hardcoded


