wrote a blog post trying to understand how local LLMs work so I can (in part 2) run a couple and squeeze as much performance out of them as possible on meager hardware
it's 1800 words, no pictures, very boring, i dont blame you if you dont read it, i wrote it mostly for my own amusement
