MacBook M5 Pro and Qwen3.5 = Local AI Security System

https://www.sharpai.org/benchmark/

HomeSec-Bench — Local AI vs Cloud Benchmark | SharpAI Aegis

Qwen3.5-9B scores 93.8% on 96 real security AI tests — within 4 points of GPT-5.4 — running entirely on Apple Silicon. Full benchmark results and methodology.

The M5 Pro just dropped, so here's a real AI workload instead of another Geekbench score. We run Qwen3.5 as the brain of a fully local home security system and benchmarked it against OpenAI cloud models on a custom 96-test suite. The Qwen3.5-9B scores 93.8% — within 4 points of GPT-5.4 — while running entirely on the M5 Pro at 25 tok/s, 765ms TTFT, using only 13.8 GB of unified memory. The 35B MoE variant hits 42 tok/s with a 435ms TTFT — faster first-token than any OpenAI cloud endpoint we tested. Zero API costs, full data privacy, all local. Full results: https://www.sharpai.org/benchmark/
HomeSec-Bench — Local AI vs Cloud Benchmark | SharpAI Aegis

Qwen3.5-9B scores 93.8% on 96 real security AI tests — within 4 points of GPT-5.4 — running entirely on Apple Silicon. Full benchmark results and methodology.

Currently the barrier to entry for local models is about $2500. Funny thing is $2500 is about the amount my parents paid for a 166 MHZ machine in 1995.
This is very false. My first system was a 3060 which you can buy new for about $300 or used for about $200. If you already have an existing system you can use it, else you can pick up a used PC for about $150. Entry is about $500.

Perhaps OP was referring to a usable agentic system, for which $2500 sounds about right.

I've got a 3060 myself, which is nice to play around with the smaller models for free (minus electricity) and with 100% uptime, but I was not able to program anything with them yet that I didn't want to rewrite completely. A heavily quantized Qwen3.5-27B model is getting close though. Maybe in a few months.

I was actually thinking of the AMD Ryzen AI Max+ 395 which compiles the linux kernel in 62 seconds and is the first usable integrated graphics solution I've seen.

Benchmarks: https://old.reddit.com/r/LocalLLaMA/comments/1rpw17y/ryzen_a...

Entry level is actually MAC MINI 16GB at <$499, I have models running on M2 MINI 16GB, it's working with small models.
If "small models" is the bar, then you can run inference for ~$50 on Raspberry Pi like hardware. I do that with 1.8b-4b models.
LFM 450M for vision task, QWEN 9B Q4 for Orchestration, this provides a good result.
This seems like an inevitable idea: a security system with full context. So you don't get alerts about your friend's car plates or your kid coming home late.
Exactly, the memory of full context is very personal, so I'd like to keep the local.

Are we “there” yet? To the point where deploying this as a serious security system makes sense? Or are we still in the research and demo phase?

My intuition is that OpenClaw-like systems still make too many mistakes to be trusted with security. And that it will take more months or years until the models and harnesses are truly ready.

Can someone share how this stacks up to a Frigate? What I am struggling with this is how it sits in the security stack. Is it recording things of interest with motion or is it only a layer on top of the existing nvr