In the trenches I've been making investments in being able to host and run our own open weight models as I wait for the inevitable rug pull from the big AI companies. Surprisingly if you use the right combination of models and some specific used equipment you can achieve good results.
The AI harness I'm finding out is almost as important as the model in getting real world work done. There are some really great open source projects around this.
I'm finding that my use of AI agents is analogous to how droids are used in the Star Wars universe. They build some stuff and do grunt work I'd rather not do.

@mike This is something I'm accepting as well.

I code for a living and I _loathe_ AI, but you know what I loathe more? Is writing unit tests for my code, and I am not stupid. I know what wins there.

@mike tell me more...this is something I'm thinking of returning to after avoiding the last decade of LLM chatbot slops
@johnefrancis That chat bot era was such a mistake. People getting their first impression of AI from shitty chat bots trying to replace customer service people is where a lot of angst boils up from. That and all the image slop.
Get a an old workstation with a bunch of ram (64gb), like an hp Z series. Buy a GPU with as much vram as you can afford, then run Ohlama. You should be able to find light weight models that will run reasonably.
@mike This has pretty much my boss’s experience running various models in the office to test things out.
@Chigaze Personally I think this where the real AI revolution come from. People building cool stuff with their own AIs.
@mike What gear are u using?
@krst We are using a 4u supermicro chassis with eight GPU bays. I'm adding GPUs as cash flow allows. We also have a new strix halo box that we use for development, etc. So far under 15k in gear. As always more compute would be lovely.
@mike what models and what harnesses?
@drewdaniels I like building tools and automating some workflows. Claude code was a good start but I'm using hermes more and more as my harness. Qwen coder, and Qwen 35b, GLX-Flash, Kimi-k, Gemma are some good models. Look for mixture of experts and a quant that fits your vram.
@mike How do you keep Hermes safe? Do you limit tools? Sandbox? How well does tool use work with 32GB or smaller models?
@drewdaniels You have to be careful about how much agency you give your agents. Make any access you give them read only and if you do elevate permissions make sure you give the specific guard rails. I'm actually working on an agent proxy where you can dynamically grant or revoke specific elements of API access you give them
@drewdaniels I also keep them in their own VM. The local models are usually very good but I supplement them with deepseek v4 because it's insanely cheap right now. Although I fully expect a rug pull.