Mastodawn

Viss Feb 24

nobody confident in their own abilities is panicking

https://www.theregister.com/2026/02/23/claude_code_security_panic/?td=rt-3a

the people who are panicking are signaling.

Infosec community panics as Anthropic rolls out Claude code security checker

ai-pocalypse: Not the first of its kind

The Register

Show thread

David Zaslavsky Feb 24

@Viss Yeah, as a security-minded devops engineer, this is dope. (Well, y'know, aside from all the general ethical/environmental/etc. concerns about LLM use.) Having more "eyes" out looking for security vulnerabilities is a good thing, and especially so when one set of "eyes" is biased in a different way than typical human reviewers and thus is well placed to notice some subset of problems that humans would probably miss.

Of course, that only applies as long as it's used sensibly. Which means using LLMs to report issues for human review and validation, not letting an agent loose on a code base with the ability to automatically file security reports for anything it finds. (I have little confidence that the tool will actually be used sensibly in most cases.)

Show thread

Viss Feb 24

@diazona you should be aware that i am actively working on research that intends to measure just how often llms lie about shit, even when using skills and mcp servers, because at the end of the day, no matter what layers you put on top of an llm, it still fucking lies and hallucinates - even when its told to use skills and mcp servers

so.. your sentiment, while optimistic, makes the assumption "that this shit works"

but .. it doesnt.
at least not with enough precision to be relied upon

Show thread

David Zaslavsky Feb 24

@Viss Yeah, that was the whole point of my last paragraph

Show thread

Viss

@diazona but even using llms to report issues for human review will be problematic as humans will end up chasing ghosts

Show thread

David Zaslavsky Feb 24

@Viss Depends on how frequently the reports are legitimate and how much time the reviewers spend chasing ghost reports versus the benefit they gain from the legitimate ones. Different organizations/groups/developers will draw the line in different places. In some cases I could imagine if the LLM has a 1% hit rate that's good enough, whereas an individual developer or a team working on a low-impact project probably wouldn't bother until the rate gets much higher, if at all.

Show thread

Viss Feb 24

@diazona heh, imagine trying to propose a budget to finance by saying "99% of the time our analysts spend is complete bullshit, gimme more money"

Show thread

Viss Feb 24

@diazona https://github.com/anthropics/claude-code/issues/28144

Feature Request: Claude should know its runtime environment (Desktop App vs CLI vs Web) · Issue #28144 · anthropics/claude-code

Summary Claude Code does not know which interface it is running in. When asked, it cannot distinguish between the Desktop App, the CLI/Terminal, or the Web interface. This leads to incorrect assump...

GitHub