Mark Griffin

@seeinglogic@infosec.exchange
95 Followers
27 Following
82 Posts
Dev/hacker | Improving human understanding of code | A picture's worth 1KLOC
websitehttps://seeinglogic.com
GitHubhttps://github.com/seeinglogic
twitterhttps://twitter.com/seeinglogic

~6 months ago I posted about how a LiveCTF competitor won a few challenges with a an AI bot in the background.

Since then, I've been seeing versions of the "LLMs have ruined CTFs" discussion occur in bits and pieces, but haven't found anything consolidated... are there any good writeups or discussions out there?

Particularly interested in the area of "what LLMs are not good at" or even anti-LLM techniques beyond attempting prompt injection.

Junkyard was an absolute pleasure to host again, it was awesome to see it take off... we even had a Roller Coaster Tycoon exploit this year!

In case you missed the show, @caseyjohnellis gave a great writeup of the EOL targets and exploits shared: https://cje.io/2026/02/07/for-the-love-of-the-game-districtcons-year-1-junkyard/

VSCode has leaned forward on a lot of fantastic usability enhancements...

But their recent "terminal autocomplete suggestion" setting has definitely been a mixed bag for me (distracting and suggests bad completions).

To disable: settings > "terminal suggest" and uncheck

The interactive repo visualizer we made for exploring the scale & detail of #AIxCC challenges just went live on the archive site: https://archive.aicyberchallenge.com/repoviz/

AFAIK, this is the only way currently to see some of the details (like actual code diffs) of the vulnerabilities added for competitors to find.

This was an interesting challenge to design because we wanted it to be visually engaging & interactive, technically honest, and appeal to both security newcomers and experts.

If you find this sort of thing interesting, pass it on!

Finally ran my own experiment on the two LiveCTF challenges where an #AI bot beat the top human competitors.

Granted, these were the challenges that we knew that AI was successful against...

But I was still surprised by the success of current models with a single prompt, which certainly is not the most effective way to use LLMs.

Sharing so others can learn and try things themselves: https://seeinglogic.com/posts/livectf-ai-debut/

Team Atlanta's report breaks down how their CRS found and fixed bugs to take first place in AIxCC: https://team-atlanta.github.io/papers/TR-Team-Atlanta.pdf

The report covers a ton: LLM usage & strategies, orchestration, automatic patch generation... but to me it really shines in its broad coverage of issues that arise when trying to fuzz large real-world codebases.

And the best part is that you can just go read the code! https://github.com/Team-Atlanta/aixcc-afc-atlantis

This level of transparency is frankly amazing, and one of the best things about AIxCC.

ICYMI: 5 systems from AIxCC are now Open Source: https://archive.aicyberchallenge.com/

An unprecedented opportunity to peek into the toolkit of top teams like Team Atlanta (Georgia Tech, Samsung Research), Theori, Trail of Bits, Shellphish/ASU, etc...

Everything from prompt templates, to terraform code, to implementations of very recent research techniques, it's all there đź‘€

If you prefer watching talks to reading code, check out the recordings from the stage talk each team gave about their CRS and the competition at https://aicyberchallenge.com/def-con-33/ (just scroll down to "Stage Talks" and click "Competitors").

AIxCC Competition Archive | AIxCC Competition Archive

The comprehensive archive of DARPA's Artificial Intelligence Cyber Challenge

My biggest surprise at #defcon33 : in a head-to-head LiveCTF match, one player’s AI bot beat _both_ humans to the punch.

I was commentating the match & was super confused because I could see the player had only just begun their solve script: https://www.youtube.com/live/TYn38VfmDRU?si=GLDRin_TN7naMl4Z&t=15180

The player had the bot running in the background and didn’t notice it submitted a correct solution.

The craziest part: the bot solved at least two other challenges faster than the player.

This player ended up winning the whole thing, clinching the finals without the bot's help.

Granted:
- LiveCTF challenges are designed to be “easy” for top CTF players, solved in 10-30 minutes
- LiveCTF’s format is straightforward and didn’t change since last year, and thus easily automated
- The bot was built by a world-class CTF player w/ experience building AI tools

But:
- These were non-trivial challenges that required synthesis of multiple concepts (PNG format, internal structure offsets, shellcode)
- The player provided almost no input at all, other than the challenge binary and presumably info on the LiveCTF format & challenge category

As the organizers of LiveCTF, we allowed for this possibility as an open challenge, but we were all surprised by this.

Perhaps a small turning point, but it marks a change in #CTF. Whether by policy or technical solutions, organizers will need to handle AI solvers.

The #defcon hardcopy of
@phrack is a thing of beauty.

As usual, the content has excellent technical depth and vibes.

Reading Orange Tsai's musings on CTF and his role as a "bug archeologist" really resonated with me and was a reminder of the connection of the hacker spirit.

My sincerest thanks to the folks that made it happen.

It will always have a spot on my bookshelf.

If you're not at #defcon right now and feeling some CTF FOMO, you can still tune in and watch the semifinal and final matches of LiveCTF at https://livectf.com

Scroll down to see the bracket with matches in your local time.