Open source is benefiting from the current AI trend: some projects are already improving their security posture and reducing their attack surface.

Proprietary software, for now, seems more out of the loop.

But once LLMs become better at analysing binaries, compiled code, and obfuscation, I wonder how vendors will handle the likely increase in vulnerability disclosures.

#ai #opensource #proprietary #cybersecurity

@adulau GhAIdra
@itisiboller @adulau
He is currently doing things like that for attacking firmware.
https://swecyb.com/@troed/116583786172746225
Troed SĂ„ngberg (@[email protected])

Current reverse engineering setup*: 1) llama-server built with MTP support (just now merged into main) 2) unsloth qwen 3.6 35b a3b UD_Q4_K_XL MTP 3) Opencode with Superpowers and DCP 4) pyghidra-mcp This allows for reverse engineering using Ghidra by prompting the LLM to do the heavy lifting. It will rapidly go through Ghidra-analysed functions and rename all the common ones, draw up the whole callpath, identify interfaces between secure and nonsecure areas etc. It's a complete game changer since while it's not doing anything I can't do myself the speed at which it's done is absolutely staggering compared to the manual labour. I go from not knowing anything about the fw/flash I'm working on to have it completely broken down to where I can start looking for exploits in mere hours. And yeah, the LLM aids in looking for those as well. So far my input to what exploits to look for is needed, but I can well see there being fine-trained or fully trained exploit-LLMs that would have that completely automated as well. *) This is all running on a modern workstation where ~20GB of system RAM is needed in addition to the 16GB VRAM 5060Ti. #EthicalHacking

Swedish Cybersecurity Community

@gunstick

But this includes the intermediate disassembled or/and decompiled code and using the LLM on that code.

I mean more on the binary level directly, I haven't seen generative model trained at the binary level directly.

@itisiboller

@adulau @itisiboller
I don't think binary directly or a hexdump would work.
LLM is trained on source code.
Taining it on binary to implement some sort of decompiler adds a new level of uncertainty. So better use a decompile step to have some crude but stable source code representation.