Mastodawn

Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!Mar 9

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

We're screwed.

At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

But that doesn't matter, because it found it.

The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

(Continued)

Show thread

Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!Mar 9

So it's not surprising that an LLM can solve them, because it automates the process. That just takes all the fun and all the learning out of it, completely defeating the purpose.

I'm sure you could still come up with challenges that LLMs can't solve, but they would necessarily be harder, because LLMs are going to oneshot any of the "baby" starter challenges you could possibly come up with. So you either get rid of the "baby" challenges entirely (which means less experienced teams can't compete at all), or you accept that people will solve them with LLMs. But neither of those actually works.

Since CTF competitions are pretty much by definition timed, speed is an advantage. That means a team that does not use LLMs will not win, so teams must use LLMs. This applies to both new and experienced teams. But: A newbie team using LLMs will not learn. Because the whole point is learning by doing, and you're not doing anything. And so will not become experienced.

So this is going to devolve into CTFs being a battle of teams using LLMs to fight for the top spots, where everyone who doesn't want to use an LLM is excluded, and where less experienced teams stop improving and getting better, because they're outsourcing the work to LLMs and not learning as a result.

Show thread

Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!Mar 9

This is, quite frankly, the same problem LLM agents are causing in software engineering and such, just way worse. Because with CTFs, there is no "quality metric". Once you get the flag you get the flag. It doesn't matter if your approach was ridiculous or you completely misunderstood the problem or "winged it" in the worst way possible or the solver is a spaghetti ball of technical debt. It doesn't matter if Claude made a dozen reasoning errors in its chain that no human would (which it did). Every time it gets it wrong it just tries again, and it can try again orders of magnitude faster than a human, so it doesn't matter.

I don't have a solution for this. You can't ban LLMs, people will use them regardless. You could try interviewing teams one on one after the challenge to see if they actually have a coherent story and clearly did the work, but even then you could conceivably cheat using an LLM and then wait it out a bit to make the time spent plausible, study the reasoning chain, and convince someone that you did the work. It's like LLMs in academics, but much worse due to the time constraints and explicitly competitive nature of CTFs.

LLMs broke CTFs.

Show thread

Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!Mar 9

And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

This sucks.

Show thread

Sonikku Mar 9

@lina

AI is fast eradicating any learning activity.
In my current job, learning anything new is actively discouraged.

As was said to us "they only care about numbers on a dashboard".

I got to the position I am in, at the level at I am in, by being curious and very interested, in taking things apart, and figuring out how they work.

A LLM, which, in the eyes of a CEO means he can get rid of people like me, is the end of the road, we are all doomed.

Show thread

jmj Mar 9

@Sonic2k @lina your looking at it the wrong way. Yes it’s killing one type of learning. But it’s teaching you how to CTF using AI, what are it strengths and weaknesses, what prompts are effective? What sub problems should the AI tackle, what should the human focus on. It’s no different than a carpenter switching from a hand plane to a powered belt sander. The skill set changes, the results are more or less the same. Someone that only learns to belt sand isn’t less of a carpenter. It gatekeeping to think otherwise. Yes the “elitist artists” will argue otherwise, but the difference is moot for the vast bulk of us working stiffs.

Show thread

laund Mar 9

@Jmj @Sonic2k @lina classic ai apologist "expertise is unnecessary" fallacy. The results are perhaps similar on the surface "was the task completed" level but if person does it and learns the details an LLM can brute force past, that person can then recognize the issues showcased without going out of their way to look for them, which is a incredibly important part for security work. Because the real world is far messier and less clear than a CTF, and part of dealing with that is the kind of intuition and almost subconscious understanding which is impossible to achieve by using an LLM. And CTFs used to be decent at finding and rewarding those who are good at that.

Show thread

jmj Mar 9

@laund @Sonic2k @lina I never said "expertise is unnecessary".

Expertise is always necessary. All that changes is what types of expertise.

I was an expert 6502/Z80 assembly language programmer. Now that expertise is mostly useless. And actually, harmful for writing Rust code. The mental models I developed for CPU behavior is completely wrong for understanding Rust code on ARM/x86 multi-core processors.
Because I learn assembly language level stuff, it does not make me a better or worse programmer compared to someone that only learned high level languages. Yes, we will perform differently in narrow cases (say compiler bugs, vs multi-core perf optimization) by for most code in most projects our expertise will be indistinguishable.

I know when AI coding I spend more time reading code, analyzing test cases and spec writing. And a lot less time banging out lines of code and reading library/tool docs to yak shave them into working. I need to know different things, not better things or worse things, just different.

Personally, I'm most interested in what the 12-year-olds are learning to do with these AI tools. In exactly the same way I learned what computers can do, with my BBC micro in the 80s

The job is EXACTLY the same, press buttons until the pixels blink the way you want them.

Show thread

laund Mar 9

@Jmj @Sonic2k @lina This feels strongly like you have no idea how people who aren't already pretty knowledgeable use LLMs

Show thread

jmj Mar 9

@laund @Sonic2k @lina

I honestly don't care about them. It doesn't matter in exactly same sense that all those folks building terrible GeoCities websites don't matter.

What matters is those folks that learned HTML and design sense on GeoCities. They became experts. They went on to build what we call the internet now. Fantastic websites that we ALL use. They developed the design guidelines and aesthetics that we now love.

Anyone can use any tool to make crap.

What matters is what experts can make that wasn't possible (or too expensive) before.
And I am strongly against gatekeeping on how someone learns to become an expert.

Show thread

laund Mar 9

@Jmj @Sonic2k @lina There is a stark difference between deciding to learn a certain way and making a test of skill completely irrelevant (the latter being the topic of this post)

Show thread

jmj Mar 9

@laund @Sonic2k @lina I don’t like absolutes.
It’s no more or less than how you should feel about dictionaries, spell checkers, calculators, or Mathematica in relation to spelling bees and math olympiads. In some cases we don’t use or allow the aids and in other cases they are. And those competitions have similar relationships to the real world as CTF competitions do to security work.

Show thread

Sonikku Mar 10

@Jmj @laund @lina

My parents didn’t have bosses telling them that calculators, computers, dictionaries were going to make them redundant and continue to rub that in their faces did they?

I find myself in a position where I am soon going to be replaced by an LLM because I cost too much on the balance sheet.

That’s why I am planning to quit before May. Before they can make me redundant and fuck me

Show thread

jmj

@Sonic2k @laund @lina
They 100% did...
Calculators replaced armies of bookkeepers.
Desktop computers replaced huge typing pools.

I never said the transition wouldn't be very painful to real people. History is full of people completely screwed over by change. But the change is not going away.

On a personal note, don't quit. Make them fire you. Why would you give up a paycheck unless you don't need it. And are you really sure you cannot work with LLMs? Competent people are always valuable.