Okay, okay. I need to devote some time to catching up on genAI capabilities in a professional sense.

Security Researchers & SecOps - what's your favorite use case so far?

Also, what's a lesson you learned the hard way?

***Also - please save the snark. I'm tired, and this is a genuine, if hesitant, ask.

#infosec

One of my professional networks - filled with actual practitioners - is substantially less negative on AI lately. There's spend and craft involved, so it's not a turnkey solution, but a lot of folks in this trust group are finding substantial productivity benefits, rather than hype.
@neurovagrant Yep, same. Just a pity I had to quit my job because those specific people didn't have a clue.
@neurovagrant they've moved on to 'harness engineering' from prompt engineering. i can show you what I've built if you'd like
@Viss would definitely appreciate any experience you feel like imparting
@neurovagrant just let me know when you have some free time today, if youre game. i think my entire day is earmarked to deal with with it all

@Viss the day has escaped me :( but let's find time soon, please.

i am a grumpy fucker this afternoon though, and should not expose you to that. lol

@neurovagrant im down to throw shade if you wanna vent too!

@Viss if i start venting

i may never stop

=)

@neurovagrant
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
ᵒⁿᵉ ᵒᶠ ᵘˢ
@Viss @neurovagrant Wtf is “harness engineering”?
@schrotthaufen @neurovagrant so you know what prompt engineering is, right?

@schrotthaufen @neurovagrant so harness engineering is tuning 'the thing you use to talk to the llm' instead of 'wordsmithing your prompt'. because the harness itself does a lot of the heavy lifting.

thing of stuff like claude code, crush, opencode, openclaw, nemoclaw - these things all talk to the llm on your behalf and handle a bunch of the heavy lifting, so "your harness" can be way more effective than "your prompt"

@Viss @neurovagrant Ah that makes sense. Thank you for the explanation.

@neurovagrant bluntly, these people are delusional.

I have seen LLMs stacked against well established ML systems. Because the decision was made (incorrectly) that LLMs would be 'cheaper.'

They were quite literally multiples more expensive. And the results went from 97% accuracy to <70%. Getting it anywhere near the same level of accuracy would multiply costs again.

@neurovagrant and it wouldn't surprise me if most of them have no existing ML, or their ML was ineffective nonsense.
So they're quite literally incapable of seeing that they're wasting multiples for below acceptable results. Or they've gone all-in on the psychosis thinking LLMs are good at regexps (they are absolutely not.)
@rootwyrm ***Also - please save the snark. I'm tired, and this is a genuine, if hesitant, ask.

@neurovagrant not to toot my own horn too much, but I've had some surprisingly good results using the SAT-skill stuff I put out a couple weeks ago.

Like, follow up an investigation agent conclusion with "validate your conclusions by running a SAT-Skill Devil's Advocate test against them." It still gets over-ambitious at times, but it dials itself back a bunch.

I've found that a layer of "criticize your own conclusions" is a good pattern for a lot of LLM use, both in code and analysis.

@neurovagrant I haven't tried it yet but one of the areas I have actual hope for (and, time permitting, will give a shot after my vacation): Triage for the initial wave of stuff like secret detections, when the tools are freshly turned on. With all the love for classical secret scanners: They are pretty fucking noisy. Throwing a language model on the outputs should do a decent job filtering out the password = "notanactualpassword" or key = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' so we can annoy the right people to fix there fucking shit first.
@neurovagrant its probably less relevant for a steady-state but jfc, we have about 80k detections of this heading our way & from some test samples we are indeed looking at something like a 30/30/30 split complete FP/grey area/true positive soooo...

@nyanbinary @neurovagrant

I once had DumpsterDiver in a container running against the NAS. I didn’t have an LLM then so it took a lot of tuning, but it found AWS, Azure and infra API secrets.

Put infrastructure teams through Python boot camps for automation, they did…

@neurovagrant
sigh
I so want to write an article about how I use LLMs at work... I have an entire system (contained in a single git repo) that relies heavily on org-mode (Emacs) with cross-linking, tool execution in-place, kanban-like board for agents, specific agents for validating cross-team and cross-user updates, custom in-house tools for accessing any systems we can possibly hook into (for automatic auditing, investigations, reporting)...

It's an entire thing. Lets me be half of a real infosec team on my own. At the same time, I am burning through company money like there's a flash-fire.

Lessons learned:
1. Most behaviors can be tweaked.
2. AGENTS.md needs to be short. Delegate specific behaviors/ agent patterns to skills.
3. LLMs need consistency in what they work with. Don't mix file types, formats, etc. If you're using YYYY-MM-DD,
always use that. If you're using markdown, only use markdown.
4. LLMs have a tendency to run tools in their chat window/ place, and then re-write the results (they can't copy-paste) into the output files. If your note-taking system allows for in-place code execution, use it!
5. Keep timestamps on everything. Your LLM (or agents) should have its own file where it logs everything it works on, everything that's done, completed, etc. At
minimum this allows it to stay on track better, and it gives you a place to review the work over time.
6. Give LLMs a kanban board. What I use lately is: ~/git/.llm/board.org (stores all the tasks in the form of a TODO heading with a property of created, updated, agent, followed by a plain list of timestamps of each action taken on the item), and a pile of ~/git/.llm/[agent-name].org files (storing each individual agent's "state" as it works). This allows for
some orchestration over long-spanning tasks (e.g. long investigations, reports, defining requirements). Agents can put tasks on the board, can update other agent's task status, they can pick up tasks (by putting their name in the 'agent' slot) and most importantly: communicate a little with each-other.

I'm currently running this via Cursor (because that's what we use at work), so automation is very limited (I still review all the work done manually, which is a major bottleneck... but necessary).

I think the most important part is to have 2 key agent roles filled.

One that's a long-running, context burning monster that you never let go, which is going to do little more than track what all the agents are doing every few hours. "Refresh status. Update all documentation based on all the changes since the last commit."

And then a second agent, which is going to adversarially attack this: "Check updates made since the last commit. Find any gaps, mistakes, errors, discrepancies."

You'd want to run the second agent until it says there's no issues. If you followed the agent's work instead of stepping away, you'll find it really easy to spot, verify and fix issues.

Difficult workflow, ngl.
@phil thank you, this was helpful!
@neurovagrant Oh, and have it use LaTeX for automatic generation of PDF reports. Tweaking the setupfiles/ templates take a little bit of time, but it's 100% worth it once you have a workflow going.

At the moment, every single incident ticket I file ends up in the ~30 page range, with full audit trails - every cli call to the SIEM, with the full command, and nicely trimmed json output (jq), with a neat summary.

Another thing is I've built us an infosec-mcp, which holds all the policies, regulations, laws and other documents of note as a reference for our LLM use. This way if there's any questions, the LLM can either answer them, or it can signal that there is a knowledge gap to the infosec team (we get e-mailed about which questions the infosec-mcp isn't able to answer, and the user is directed to speak with us or compliance about their concerns).

Which serves another role: devs (once infosec-mcp is out of team-internal testing) will be able to get answers to security/ compliance questions faster and more accurately than relying on their managers/ leads.

And it makes investigations so much easier, because spotting policy and compliance breaches is nearly automatic (needs eyes because there's some false positives, but it's better than missing issues).

sigh

I really wish I could demonstrate. I hate my NDA.

@neurovagrant

Haters showed up, happy to have a private chat.

@neurovagrant alright I'll bite:

The only instance I've found AI useful: de-obfuscating obfuscated code. I use it for this because I don't know how to do it myself (because I'm bad and dumb)

A lesson I learned the hard way: Spending 3 hours trying to get a KQL query to work in Sentinel following instructions given to me by an AI, only to realize that the reason why it didn't work is because the primary table the AI gave me didn't exist, and had never existed.

@Mustardfacial oooh, good to know. thanks!

@neurovagrant I should also point out that these experiences were discovered while I was still experimenting and messing around with AI systems when ChatGPT was first released.

I no longer use it at all anymore as I've discovered it's often faster for me to just research things manually, and at least then I don't get lied to. Also the number of research papers that have been released showing that AI use actively damages your critical thinking skills and effectively makes you stupid (https://arxiv.org/pdf/2506.08872v1, https://www.sciencedirect.com/science/article/pii/S0001691825010388, https://arxiv.org/pdf/2407.14452) have put a bad taste in my mouth about the whole thing so I've decided it's a crutch I don't need.

@Mustardfacial yeah i'm well aware of the downsides, i am a deep skeptic, but the industry doesn't give a damn.

doesn't provide me much choice.

@neurovagrant I mean you always have a choice. The question is if you're willing to live with the adverse effects.

I'm not trying to sway you one way or another. You're an adult, make your own decisions. But you asked for our experiences and this was mine.

@Mustardfacial @neurovagrant There's two ways I think of it:

1) You usually have more leverage than you think, as a practitioner, to just not do things.

2) The tools bill themselves as being democratizing and easy to use, so the day the entire industry decides to force me to use them, I'm not worried about being able to pick things up quickly. I can't remember who said this, but: "prompt engineering" skills are really just basic reading and writing skills.

@Mustardfacial @neurovagrant i have used it to sketch up KQL when i dont know where to start. And it is ok ish at that, but only if i already dont know where to look for the data, spending three hours in sentinel with a generated KQL or there hours in MS learn to try to figure it out, it is a bit the same. Its not like MS learn is a single source of truth either, it is not very well maintained, and sentinel changes too fast for learn to keep up

@neurovagrant

This from @ridt and @jags from three years ago is dated but possibly still instructive.

https://alperovitch.sais.jhu.edu/five-days-in-class-with-chatgpt/

Five Days in Class with ChatGPT – The Alperovitch Institute

@neurovagrant

Sure thing! I'd also recommend scouring recent CAMLIS archives, along with perhaps USENIX, RAID, etc.

@neurovagrant I have put Claude Code to the test in earnest for a well-paid, high-stakes project starting just two weeks ago. For context, I am a highly skilled software engineer with decades of experience working professionally. The project I am working on is a basic web app with a whole lot of features — TypeScript, React Router “serverless” architecture deployed on Vercel, some Python in the back-end running LangChain for the AI features of the app. The project is way too ambitious, given the size of the team working on it right now. Management (perhaps foolishly) thought AI would let us deliver on time and on budget despite the sheer volume of work that needs to be done. I was brought onto the team when it became clear that there aren’t enough engineering resources devoted to this project. My AI setup is Claude Code running in Emacs.

So the good news is that AI is genuinely making me work a lot faster, but I had to make a few mistakes at first. I have to follow some pretty strict rules that I set for myself. I learned pretty early on that if I don’t write most of the code myself, I don’t learn anything. If I don’t take notes and write comments, I don’t learn anything. I had worked on the project for a few days before I realized I hadn’t learned a single thing about the software, and had already become very dependent on AI to make important decisions for me. It was hard to solve bugs because I didn’t know what was going on.

So the key take-away is that you absolutely will become dependent on AI as a crutch for what you don’t understand, and it will happen without you realizing it. You have to work very, very hard to not delegate to the AI your responsibility as an engineer to understand the code. You won’t be able to solve problems or explain what you have done to other people because you don’t really know how the system works because you didn’t really write it. You won’t be able to explain the challenges you encountered or the engineering trade-offs you were forced to make because you didn’t make those choices.

You have to slow down to the speed at which you can understand what code is being written. You have to push back on people who are pushing you to deliver features faster and faster, or you will end up becoming dependent on the AI.

One big problem with AI: it tends to copy-paste it’s own code from around your source base. So if you let it make a bad decision (perhaps because you didn’t realize it was a bad decision), pretty soon that same bad decision is being used everywhere throughout the code base. As an example, the AI was using a lazy little hack to reuse a database connection pool in just one line of code. Great for a prototype, not so much for an industrial product. The AI had also written some nice, reusable code, a wrapper around the PostgreSQL client library to obtain the DB connection properly and in a type-safe way. But when I “grepped” for examples of how to use that wrapper, I discovered that the wrapper was not being used anywhere. Instead, the 1-line hack was being used everywhere, in something like 30 different places throughout the code base.

So in order to make sure that I can understand code for which I am responsible, I decided I would not use the AI to write code for me. I would ask it how to write code, and actually physically type it all out myself. This forces me to slow down and think about what I am doing, and it helps me remember how to write code to solve problems.

Claude Code is hands-down the best linter I have ever used. After I write code, I always ask Claude Code to do a review. I tell it not to fix the mistakes for me, but to tell me what mistakes I have to fix. The process of fixing my own mistakes helps me learn how to do things the correct way. At first I would just write pseudo-code because there was so much about TypeScript and React Router I didn’t know. But after a few day of using AI as a linter, I can write code on my own most of the time, and the mistakes the AI catches are fewer and further between. I have never learned so much about a programming language and framework in such a short amount of time, AI is truly very useful for this.

Also, very occasionally, the recommendations Claude Code makes are wrong. But if you are making the changes yourself, not letting the AI do it for you, and actually thinking about what you are doing, you can catch problems before they get buried in lots of other logic.

Occasionally I let the AI write code for me. For example, I asked it to create a GUI for a testing and debugging tool that is not going to be shipped in the final product. Only I am going to use this tool, so I let the AI write that for me. And it worked extremely well! Hundreds of lines of throw-away code written in a just minute, something that would have taken me hours to do, and now I can benefit from that developer tool and it makes me more productive.

@ramin_hal9001 @neurovagrant really appreciate this perspective. I like that you seem to more ask the AI for advice VS having it actually do the thing, so you can understand what's happening, think critically, and continue learning. I haven't read any anecdotes about folks using it this way before. Thanks for sharing.
@neurovagrant I’ve used it to help build first pass threat narratives when an alert is triggered. I did side by side analysis of a SQL backed agent vs a GraphDB based one and did several runs on known behavior I had emulated. My agents almost always found the behavior but had some trouble either adding extra activity or attributing the wrong activity. But about 80-90% was perfectly correct. It’s an excellent first pass for something that would take me several minutes or even hours before depending on complexity. I found that the type of DB backend modified performance with everything else (system prompt, tools, etc.. staying the same).
@neurovagrant So I've used the "research" agents of Copilot and Gemini to decent success. I tend to try and come up with a comprehensive prompt, feed that to the prompt coach agent, then send it to the researcher. It's helped me dig into certain concepts to better effect (Of course, you gotta check the source materials).

@neurovagrant I have found large productivity gains for any problem where the input or output is in common English or loosely structured text (eg code, json, etc).

The caveat is that it needs to be the kind of problem where verification of correctness on the output is significantly easy and where a fuckup doesn't break shit.

So things like searching large documentation databases, or filling out security questionnaires (with a manual review) or generating visualization code have been great.

Complicated workflows or app changes in my k3s homelab cluster.... Well let's just say I'm glad I have a solid backup system.🙃

@neurovagrant I haven't been using AI myself so I can't help, but I feel this struggle myself and I'm watching this thread in hopes I can gleam something from it too.