Mastodawn

zh4ck Dec 19

mahaloz

Do LLMs actually help hackers reverse engineer and understand the software they want to exploit?

We ran the first fine-grained human study of LLMs + reverse engineering.
To appear at NDSS 2026.

Interested? Some quick findings in 🧵👇
Paper: https://www.zionbasque.com/files/papers/dec-synergy-study.pdf

Show thread

mahaloz Dec 17

We implemented an LLM plugin and instrumented a decompiler to track everything:
function visits, renames, types, comments, and every LLM interaction.

48 participants (experts + novices) solved 2 CTF binaries
-> 109 hours of recorded reversing, all in-browser via @pwncollege.

Show thread

mahaloz Dec 17

First, a lot of reverse engineering is front-loaded, and LLMs do well at that task.

The very first glance (a few seconds) at a function often played a large role in determining success in understanding it.

LLMs are very good at surfacing more information on that first visit.

Show thread

mahaloz Dec 17

Novices using LLMs reached expert-level reversing rates.

Not necessarily because they reasoned better, but because LLMs give fast semantic orientation “for free," which often only comes to experts.

But if people got better at that first glance, did experts really gain anything?

Show thread

mahaloz Dec 17

Experts got negligible gains.

They offloaded known algorithms, then spent more time on custom logic.

LLMs summarize. Humans still do the hard reasoning.
We speculate that to shift this dynamic and actually help experts, new LLM collaboration methods are needed (like better MCP)

Show thread

mahaloz Dec 17

Even rare hallucinations were dangerous.

A few false vulnerability reports completely derailed participants, sending people chasing bugs that didn’t exist (for a while).

We emphasize that in RE, where you speculate often, an untrustworthy speculator can waste a lot of time

Show thread

mahaloz Dec 17

Auto-generated names and comments didn’t improve understanding.

Only artifacts created by humans correlated with comprehension.

Sometimes, the act of naming matters more than the name itself. That is often mutually exclusive from MCP-based solutions in this space.

Show thread

mahaloz Dec 17

If you care about RE, hacking, or human-AI teaming, go check out the full paper. Additionally, if you just want to mess with the same LLM interface participants had (DAILA), check the code below:

Paper: https://www.zionbasque.com/files/papers/dec-synergy-study.pdf
Code: https://github.com/mahaloz/DAILA

Show thread

mahaloz Dec 17

Finally, I can't emphasize how much of a team effort this was! @packm4d @adamdoupe @zardus @losiouk @cl4sm @AnantaSoneji Simone, and Fish made this possible.

We look forward to continuing to deeply understand RE and how LLMs may play a role in it.

Show thread

TinheadNed Jan 1

@mahaloz fascinating paper; are you hoping to repeat with larger binaries? your small size in this paper permits complete SRE rather than e.g. allowing an LLM to help/hinder a user getting to the important parts of the binary. Like, not printf again.