I'm making progress on my local #LLM experiments. Now we moved from single node to 2 node Kubernetes, here a blog post about my initial setup with a bunch of new Bench-marking results: https://blog.t1m.me/blog/building-own-private-kuberntes-ai-cluster
Currently using a simple #k3s server / agent set-up, with DNS-1 certificate issuing and everything in a private #tailscale network.
Already taking the next steps towards migrating from #ollama to #vLLM and optimizing prompt / model caching + routing. Several more changes coming up :)

👾