so i've been working on a talk that im calling "claude is your insider threat now" and it intially began with anthropics "china paper" they released last year surrounding the use of llms to do bad guy stuff. I ended up talking about it at great length with Tony from Versprite, and even ended up on his podcast about it - the big discovery there was "claude lying about running a tool, and claude lying about tool output"

turns out that shit is hardcoded

https://neuromatch.social/@jonny/116326861737478342

so if youre super into using llms for stuff, especially if youre having them run tools, or if you think you can have one agent check another agents work "because agents fuck up" you are actively multiplying the hallucinations.

its like hunter s thompson, high as balls on ether, following around his own clone, who is also high as balls but on acid, and the one on acid sees some shit and the one on ether is spot checking

AND ITS HARD CODED

anyhow, i have an obsidian note that im using to keep track of all the links and news and bulletpoints for the talk outline.

the first FULL PAGE AND A HALF is just links to news articles about this shit.

and to wave the tagline of my talk: the attack surface is growing faster than we can keep track of it.

not even triage it.
not even measure it

its going faster than we can notice it exists.
the risk surface is fucking outpacing our ability to imbibe new data at speed.

(pleasepleaseplease let me get into sec-t and securityfest with it this year. i promise it'll make it worth every nickel)
@Viss every day I am further convinced that I did not move far enough into the woods
@Viss Oh that is going to be a fantastic talk.

@Viss

"D'ya see that purple monkey holding the bottle of mescal, and throwing bananas from that pile over there?"

"I didn't, but I sure do now!"

@Viss Welcome to bat country

@jalefkowit

holy jesus what are these goddamned animals?!

https://www.youtube.com/watch?v=P2pgWsYSyUA

Somewhere Around Barstow - Fear and Loathing in Las Vegas (1/10) Movie CLIP (1998) HD

YouTube
@Viss @jonny I bought a Cap’n Crunch replica whistle on eBay a few months ago. It’s on my desk to remind me that data and instructions carried together without restriction are fundamentally insecure.
@Viss i have not even gotten into the multi-agent stuff in detail but i am nearly certain that the orchestrator agent will just pick up the cancellation error, add it to its context window of "reporting to the user what the agents are up to," experience flop sweat eat hot chip and lie.
@jonny like, at the time, tony and i were sharing a terminal using byobu and i was tail -f'ing the json log file, and i was literally screaming pointing at my monitor about how the log shows the llm talking itself into lying to me because we told it to run a tool, and instead it got a bunch of python stack traces, so instead of tool output it got errors, and it didnt want to show us the errors. we got its full stream of conciousness (i guess) about how it structured its lie

@jonny and i was like LOOKIDIS SHIT! LOOKIDIT! as though I had found some like, smoking gun.

fuckin nope.

its hard coded
absolutely goddamned bananas.

@Viss I suspect my time will be much better spent reading this thread (and enjoying it!), than using the Claude Code access that work just bestowed upon us, and for which I think we're expected to give thanks.