Mastodawn

Going into the rabbithole of testing local LLMs right now. I don't have a dedicated GPU, but 32 GiB of RAM should be enough for anyone.

#ai #huggingface #selfhost #localai #ollama #heretic #qwen #mistral

Show thread

Tommaso Gagliardoni Feb 26

Heretic quantized versions of Qwen 3.5 have just been released but even the base Qwen 3.5 model seems to have issue with ollama currently, and I don't have bandwidth to do a manual patch now. Trying Mistral 3.2.

Show thread

Tommaso Gagliardoni

First impressions of Mistral Small 3.2: seems pretty solid, it answers "uncomfortable" political question quite neutrally.

I don't understand why #confer and #euria by #infomaniak are not based on this.

Show thread

Chris Lloyd-Jones Feb 26

@tomgag how fast does it feel? I tried using foundry local and ollama but at the time I felt slowed down. I’d be keen to swap back to a local model given how the large providers are slowly catching down the subscription token limits.

Show thread

Tommaso Gagliardoni Feb 26

@sealjay well, I'm running on local CPU with 32 GiB of RAM, so I wouldn't call it "fast". 3-5 tokens per second maybe? I guess it's OK if you give it a task and then go to grab a coffee 😅

Show thread

Chris Lloyd-Jones Feb 26

@tomgag maybe I’ll check I’m running on renewable energy before I leave a machine running over the weekend then 🤣

Show thread

blingbling Feb 27

@tomgag
Good question! Why is #infomaniak not part of the fediverse?!