I fine-tuned a 24B parameter cybersecurity LLM. It's free. And I need the security community's help.

After years of benefiting from open security tools, research, and community knowledge, I wanted to give something back.

What I built: nova:24b - a domain-adapted LLM trained specifically for cybersecurity tasks.

The training data (40K+ examples):
- 16,000 examples extracted from 407 security PDFs (threat modeling, cryptography, incident response, adversarial ML)
- ISO 27001:2022 controls (93 Annex A controls with implementation guidance)
- ISO 27005 threat catalog (48 threat categories)
- Energy sector threat database (3,386 scenarios from critical infrastructure analysis)
- Thousands of security Q&A pairs covering vulnerability analysis, secure coding, and compliance

What it's good at:
- Threat modeling and risk assessment
- Control mapping and gap analysis
- Security architecture discussions
- Incident response playbook generation

Where I need help:
The security community has datasets, red team scenarios, and domain knowledge that could make this model genuinely useful for defenders.

If you have:
- Security training data you're willing to share
- Ideas for white hat use cases
- Feedback on what security practitioners actually need from an AI assistant

Reach out. I'll share the model with anyone who wants to test it or contribute.

This isn't a product. It's an experiment in building security AI that serves the community that built my career.

Model: https://huggingface.co/pki/nova-24b-cybersec

#cybersecurity #AI #infosec #LLM #opensource

(And, oh it's not censored. It WILL use the tools you provide it to hack the shit out of your target. "It's not a bug, it's a feature." )

pki/nova-24b-cybersec · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

@pki This is exactly what the community needs - a domain-specific model trained on actual security knowledge rather than generic AI. The energy sector threat database inclusion is particularly smart given how underrepresented critical infrastructure scenarios are in most datasets. Have you tested it against any of the existing security benchmarks like SecBench or considered how it handles newer attack vectors that might not be in the training data?