In the trenches I've been making investments in being able to host and run our own open weight models as I wait for the inevitable rug pull from the big AI companies. Surprisingly if you use the right combination of models and some specific used equipment you can achieve good results.
@mike what models and what harnesses?
@drewdaniels I like building tools and automating some workflows. Claude code was a good start but I'm using hermes more and more as my harness. Qwen coder, and Qwen 35b, GLX-Flash, Kimi-k, Gemma are some good models. Look for mixture of experts and a quant that fits your vram.
@mike How do you keep Hermes safe? Do you limit tools? Sandbox? How well does tool use work with 32GB or smaller models?
@drewdaniels You have to be careful about how much agency you give your agents. Make any access you give them read only and if you do elevate permissions make sure you give the specific guard rails. I'm actually working on an agent proxy where you can dynamically grant or revoke specific elements of API access you give them
@drewdaniels I also keep them in their own VM. The local models are usually very good but I supplement them with deepseek v4 because it's insanely cheap right now. Although I fully expect a rug pull.