Mastodawn

Going into the rabbithole of testing local LLMs right now. I don't have a dedicated GPU, but 32 GiB of RAM should be enough for anyone.

#ai #huggingface #selfhost #localai #ollama #heretic #qwen #mistral

Show thread

Tommaso Gagliardoni Feb 26

Heretic quantized versions of Qwen 3.5 have just been released but even the base Qwen 3.5 model seems to have issue with ollama currently, and I don't have bandwidth to do a manual patch now. Trying Mistral 3.2.

Show thread

Tommaso Gagliardoni Feb 26

First impressions of Mistral Small 3.2: seems pretty solid, it answers "uncomfortable" political question quite neutrally.

I don't understand why #confer and #euria by #infomaniak are not based on this.

Show thread

Chris Lloyd-Jones

@tomgag how fast does it feel? I tried using foundry local and ollama but at the time I felt slowed down. I’d be keen to swap back to a local model given how the large providers are slowly catching down the subscription token limits.

Show thread

Tommaso Gagliardoni Feb 26

@sealjay well, I'm running on local CPU with 32 GiB of RAM, so I wouldn't call it "fast". 3-5 tokens per second maybe? I guess it's OK if you give it a task and then go to grab a coffee 😅

Show thread

Chris Lloyd-Jones Feb 26

@tomgag maybe I’ll check I’m running on renewable energy before I leave a machine running over the weekend then 🤣