Mastodawn

Show thread

Clathetic 3h ago

At least, poc.py is consistent. It does nothing. There's no bug either.

Maybe that's the real truth.

Hesitate between crying or closing the laptop lid with force.

Show thread

Clathetic 3h ago

Tons of manual review later in obscure code format.
Ok, no bug, code is good, all paths are checked correctly.

Are LLM really helping?

Show thread

Clathetic 3h ago

- ok, LLM, can you analyze vuln.md and poc.py to figure whats wrong?
- I don't think there is abug, the code is unreachable, structure of data is bad.
- Try again
- wrote /tmp/poc2.py
- still doesn't work
- Maybe no bug.

Show thread

Clathetic 3h ago

$ ./poc.py
(nothing happens)
$

Show thread

Clathetic 3h ago

- ok, LLM analyze all the bugs in /tmp/vuln.md and tell me
if you're confident with the bug?
- mmmh, not sure, but bugs are here
- write a poc.py
- done!

Clathetic 3h ago

- ok, LLM find me bugs in this codebase
- aye aye! I found a lot of bugs! :heart: :thumbup: :rocket:
- categorize them, keep the ones you're the most confident with
- done! in /tmp/vuln.md file, one is CVSS10, 100% confident!

Clathetic 22h ago

https://www.zeroday.cloud/blog/mariadb-cve-2026-32710-deep-dive

And one more AI.

Nice bug, though.

Clathetic 6d ago

another night.. sleepless..
I'm tired

Clathetic Apr 27

another night...

Clathetic Apr 24

There’s a lot of hype around LLM-driven vulnerability research, but most results seem to come from large-scale scanning (run it across thousands of repos means find something, but it doesn't prove anything about your real capabilities...).

I’m more interested in how these models perform on a codebase you already understand well. Has anyone compared their own audit/reversing work against an LLM report on the same code? Signal vs. noise? I'm sure LLMs ae good at pattern-matching (SQLi, unsafe deserialize, and so on), but are they weaker on cross-file reasoning, finding weak primitives or logic-level bugs? And did specific tools such as claude code are any better than other workflow or orchestration? Would it be better to use RAG on your code then searchig for vulns, or use any code analysis tool (plug LLM onto semgrep or codeQL maybe?) ?

I’m especially interested in: false positives, missed bugs, and whether LLMs add anything beyond pattern matching. Can you share your thought on this? If there’s a paper, even a halfway honest experiment, please share. I need something more convincing than vibes (pun intended).