Mastodawn

LuxBennu

0 Followers

0 Following

5 Posts

Software engineer. Working on LLM inference, evaluation pipelines, and open-source dev tools.
This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Official	https://
Support this service	https://www.patreon.com/birddotmakeup

Show thread

LuxBennu 7h ago

I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.

[dead]

Already running qwen 70b 4-bit on m2 max 96gb through llama.cpp and it's pretty solid for day to day stuff. The mlx switch is interesting because ollama was basically shelling out to llama.cpp on mac before, so native mlx should mean better memory handling on apple silicon. Curious to see how it compares on the bigger models vs the gguf path

Show thread

LuxBennu Mar 30

Sadly I have the issue on a new m5 air. I have a 60hz 4k work monitor and two high refresh 4k gaming displays. The 60hz pairs fine with either gaming monitor, but the two gaming ones together and one just doesn't get recognized. Spent way too long trying new cables before realizing it's a bandwidth limitation.

Show thread

LuxBennu Mar 30

This is true for prohibitions but claude.md works really well as positive documentation. I run custom mcp servers and documenting what each tool does and when to use it made claude pick the right ones way more reliably. Totally different outcome than a list of NEVER DO THIS rules though, for that you definitely need hooks or sandboxing.