AI integration may seem very recent, but has been woven into the fabric of cybersecurity for many years. However, there are still improvements to be made. In our industry, models are often deployed on a massive scale, processing billions of events a day.
Large language models (LLMs) – the models that usually grab the headlines – perform well, and are popular, but are ill-suited for this application, requiring extensive GPU infrastructure and significant amounts of memory, even after optimization techniques.
Since the computational demands of maintaining LLMs make them impractical for many cybersecurity applications – especially those requiring real-time or large-scale processing – small, efficient models can play a critical role.
Many tasks in cybersecurity do not require generative solutions and can instead be solved through classification with small models – which are cost-effective and capable of running on endpoint devices or within a cloud infrastructure.
A key question when it comes to small models is their performance, which is bounded by the quality and scale of the training data. As a cybersecurity vendor, we have a surfeit of data, but there is always the question of how to best use it.
This is where LLMs have a part to play. The idea is simple yet transformative: use big models intermittently and strategically to train small models effectively. LLMs are good for extracting useful signals from data at scale, modifying existing labels, and providing new ones.
Merging the advanced learning capabilities of large, expensive models with the high efficiency of small models can create fast, commercially viable, and effective solutions.
In a new blog out today, Sophos looks at 3 methods key to this approach: knowledge distillation, semi-supervised learning, and synthetic data generation. We share the results of experiments, including command-line and website productivity classification and fake login pages.