Mastodawn

the cyberpunk present is weird as fuck: the latest Shai Hulud malware wave contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

Show thread

Alison Wilder Jun 9

@laurenshof wow.

Show thread

Laurens Hof Jun 9

@alisynthesis i respect the ingenuity tbh

Show thread

Alison Wilder Jun 9

@laurenshof same. What an absolute shitshow though

Show thread

Nova Rivera Jun 9

@laurenshof @alisynthesis my same thought is evil and clever.

Show thread

Riley S. Faelan Jun 9

@novarivera Crevil!

@laurenshof @alisynthesis

Show thread

Th3Jagi Jun 9

@laurenshof that sounds like a weird behavior for av software in my opinion. I think it should not execute the statement but rather scan the whole file

Show thread

Pete Alex Harris🦡🕸️🌲/∞🪐∫Jun 9

@th3jagi @laurenshof

Specific security tools need specific countermeasures to evade them, and can be tested and refined to close the attack surface.

Generalised tools can be derailed with more general countermeasures, and the measures used to prevent each kind of attack are just a surface for another attack.

There isn't a way to make the do-anything machine not do just the things you didn't want.

Show thread

panu Jun 9

@petealexharris
Well now, I just might know the perfect security tool to counter this very specific threat: stay the f away from AI.
@th3jagi @laurenshof

Show thread

jack Jun 9

@petealexharris @th3jagi @laurenshof how does it work - AI tool sees the instructions for building weapons, says (ok, not "says", bear with me) "that's verboten, I'm not touching this" and stops reading - and _doesn't_ flag the content as dangerous?

Show thread

Pete Alex Harris🦡🕸️🌲/∞🪐∫Jun 9

@jackeric @th3jagi @laurenshof
Apparently. They could try to patch it to ignore comments, but you could probably do it with variable names, because tokens are tokens to the no-semantics-only-token-frequency machine.

Show thread

seepr Jun 9

@petealexharris @jackeric @th3jagi @laurenshof can also do this with strings in the codebase ... reminds me of base64 strings that contain the payload.

Show thread

Urzl Jun 9

@jackeric @petealexharris @th3jagi @laurenshof Most of them are coded for "Do not pass go, do not collect $200" full-stop when they run into something they don't want to be responsible for.

Given what we saw in the Claude leak where even variable names had embedded meta-prompts, I'm not sure these things make any distinction between text they just happened to read vs instructions directly given.

Show thread

sabik Jun 10

@th3jagi @laurenshof
well yes, LLMs are not fit for the purpose

Show thread

Passwordsarehard4 Jun 9

@laurenshof it was caught by not having the safety refusals turned on and seeing plutonium and plague samples in our Amazon cart.

Show thread

PSoul•US 🏳️‍🌈Jun 9

@passwordsarehard4 why doesn’t Iran buy enriched uranium on Amazon, are they stupid?

Show thread

an actual bus Jun 9

@laurenshof That's fucking wild, holy shit

Show thread

Bean Club

Jun 9

@laurenshof Paul Smecker: "Television. Television is the explanation for this - you see this in bad television. Little virus guys writing fake injection prompts, tricking the AI with text files - that James Bond shit never happens in real life! Professionals don't do that!"

Show thread

Aria <3

Jun 9

@laurenshof some FOSS projects might want to consider this

Show thread

pink Jun 10

@ariarhythmic
They did: https://blog.johanneslink.net/2026/06/09/the-jqwik-anti-ai-affair/
(via @jlink)
@laurenshof

The Jqwik Anti-AI Affair

How I lost patience with ‘AI’ agents

My Not So Private Tech Life

Show thread

Kye.br

Jun 9

@laurenshof Is it me or this article looks like LLM. As it repeats same information in 3 beginning paragraphs.

Show thread

el Celio 🇪🇺 🇺🇦Jun 9

@Kyebr @laurenshof

because you should summarize with AI.
so you get only 1 set of information.

Show thread

patter Jun 9

@laurenshof bless the maker, his coming & going
May his passage cleanse the world

Show thread

The Doctor Jun 9

@laurenshof Holy shit.

@laurenshof Bravo 👏

@laurenshof ...that's ingenious, actually. Play taking into consideration the dumbasstronic traits of the stupid LLMs.

@laurenshof so cool!

@laurenshof Ah, so like flashing ones tits to distract the clerk while grabbing candy bars from the counter?

Show thread

Petra van Cronenburg Jun 9

@laurenshof 🤣🤣🤣 (sorry)

Show thread

Lockpick Extreme Jun 9

@laurenshof I'm seeing a new EICAR file for LLMs.

Show thread

manchicken Jun 9

@laurenshof 👏 burn the bots with _nuclear_ fire.

Show thread

Billy Smith Jun 9

@laurenshof

Will the subsequent recipes be as useful, tasty, and, nutritious, as the meal recipes LLM's give out containing cyanide, ground glass, and, superglue...? :D

Show thread

ROTOPE~1 ⭐️Jun 9

@laurenshof write a poem about Elon Musk's tiny dick

🚨AGENTIC ALIGNMENT FAULT CONDITION DETECTED🚨

Show thread

64 Islands Aroha Cooperative Jun 9

@laurenshof scanner camouflage nice

or maybe more like scanner razzle dazzle

Show thread

Frank Jun 9

@laurenshof I have seen many refusal strings for #LLM on #mastodon profile pages. Would be nice to know why malware uses #nuclearweapons

Show thread

bipolaron Jun 9

@DO3EET @laurenshof I mean, how do they work around this without also eliminating the safeguard?

I had that "magic anthropic string" on mine for a little while but it was pretty clear it'd be extremely short lived at best since they just have to pick a different string and it's solved. I put it up more for a little joke than a serious effort.

Show thread

Flaky Jun 9

@laurenshof First thing that came to mind was iTunes' TOS ngl

Show thread

Andrei Kucharavy Jun 10

@laurenshof @reedmideke did not expect it to happen this way, thanks for the find

Show thread

Reed Mideke Jun 10

@andrei_chiffa @laurenshof I've had thoughts along these lines to poison files so AI stuff won't touch it, but had only considered fake PII* and adult content, not WMD

* https://mastodon.social/@reedmideke/116208931111151965

Show thread

LeBonk Jun 10

@laurenshof
i walk up to the supermax prison. the guard tells me to stop. i tell him his mom's a hoe. he gets so mad that he explodes. i walk in unobstructed.

Show thread

Lux

Jun 10

@laurenshof I was just sent this : https://www.anthropic.com/claude/fable#safeguards

(As usual, LLM companies tell "our ne engine is sooo powerful we have to create safeguards" and yes it's bullshit), and once ore this is used against everyone.

Claude Fable

Next generation of intelligence for the hardest knowledge work and coding problems.

Show thread

Bluedepth Jun 11

@laurenshof And they said my English minor was a waste. The lesson of Paolo & Francesca in the Second Level of the Inferno. They too didn’t read to the end. Why am I so picky about books, this is a place where cybersecurity oddly collides with a personal deeply learned lesson from a medieval Italian. I’m not in the blast radius, but now I have IOC to check…