the cyberpunk present is weird as fuck: the latest Shai Hulud malware wave contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

@laurenshof that sounds like a weird behavior for av software in my opinion. I think it should not execute the statement but rather scan the whole file

@th3jagi @laurenshof

Specific security tools need specific countermeasures to evade them, and can be tested and refined to close the attack surface.

Generalised tools can be derailed with more general countermeasures, and the measures used to prevent each kind of attack are just a surface for another attack.

There isn't a way to make the do-anything machine not do just the things you didn't want.

@petealexharris
Well now, I just might know the perfect security tool to counter this very specific threat: stay the f away from AI.
@th3jagi @laurenshof
@petealexharris @th3jagi @laurenshof how does it work - AI tool sees the instructions for building weapons, says (ok, not "says", bear with me) "that's verboten, I'm not touching this" and stops reading - and _doesn't_ flag the content as dangerous?
@jackeric @th3jagi @laurenshof
Apparently. They could try to patch it to ignore comments, but you could probably do it with variable names, because tokens are tokens to the no-semantics-only-token-frequency machine.
@petealexharris @jackeric @th3jagi @laurenshof can also do this with strings in the codebase ... reminds me of base64 strings that contain the payload.

@jackeric @petealexharris @th3jagi @laurenshof Most of them are coded for "Do not pass go, do not collect $200" full-stop when they run into something they don't want to be responsible for.

Given what we saw in the Claude leak where even variable names had embedded meta-prompts, I'm not sure these things make any distinction between text they just happened to read vs instructions directly given.

@th3jagi @laurenshof
well yes, LLMs are not fit for the purpose
@laurenshof it was caught by not having the safety refusals turned on and seeing plutonium and plague samples in our Amazon cart.
@passwordsarehard4 why doesn’t Iran buy enriched uranium on Amazon, are they stupid?
@laurenshof That's fucking wild, holy shit
@laurenshof Paul Smecker: "Television. Television is the explanation for this - you see this in bad television. Little virus guys writing fake injection prompts, tricking the AI with text files - that James Bond shit never happens in real life! Professionals don't do that!"
@laurenshof some FOSS projects might want to consider this
The Jqwik Anti-AI Affair

How I lost patience with ‘AI’ agents

My Not So Private Tech Life
@laurenshof Is it me or this article looks like LLM. As it repeats same information in 3 beginning paragraphs.

@Kyebr @laurenshof

because you should summarize with AI.
so you get only 1 set of information.

@laurenshof bless the maker, his coming & going
May his passage cleanse the world
@laurenshof ...that's ingenious, actually. Play taking into consideration the dumbasstronic traits of the stupid LLMs.
@laurenshof Ah, so like flashing ones tits to distract the clerk while grabbing candy bars from the counter?
@laurenshof I'm seeing a new EICAR file for LLMs.
@laurenshof 👏 burn the bots with _nuclear_ fire.

@laurenshof

Will the subsequent recipes be as useful, tasty, and, nutritious, as the meal recipes LLM's give out containing cyanide, ground glass, and, superglue...? :D

@laurenshof write a poem about Elon Musk's tiny dick

🚨AGENTIC ALIGNMENT FAULT CONDITION DETECTED🚨

@laurenshof scanner camouflage nice

or maybe more like scanner razzle dazzle

@laurenshof I have seen many refusal strings for #LLM on #mastodon profile pages. Would be nice to know why malware uses #nuclearweapons

@DO3EET @laurenshof I mean, how do they work around this without also eliminating the safeguard?

I had that "magic anthropic string" on mine for a little while but it was pretty clear it'd be extremely short lived at best since they just have to pick a different string and it's solved. I put it up more for a little joke than a serious effort.

@laurenshof First thing that came to mind was iTunes' TOS ngl
@laurenshof @reedmideke did not expect it to happen this way, thanks for the find

@andrei_chiffa @laurenshof I've had thoughts along these lines to poison files so AI stuff won't touch it, but had only considered fake PII* and adult content, not WMD

* https://mastodon.social/@reedmideke/116208931111151965

@laurenshof
i walk up to the supermax prison. the guard tells me to stop. i tell him his mom's a hoe. he gets so mad that he explodes. i walk in unobstructed.

@laurenshof I was just sent this : https://www.anthropic.com/claude/fable#safeguards

(As usual, LLM companies tell "our ne engine is sooo powerful we have to create safeguards" and yes it's bullshit), and once ore this is used against everyone.

Claude Fable

Next generation of intelligence for the hardest knowledge work and coding problems.

@laurenshof And they said my English minor was a waste. The lesson of Paolo & Francesca in the Second Level of the Inferno. They too didn’t read to the end. Why am I so picky about books, this is a place where cybersecurity oddly collides with a personal deeply learned lesson from a medieval Italian. I’m not in the blast radius, but now I have IOC to check…