Don't overlook llama.cpp's rpc-server feature.

https://sh.itjust.works/post/39137051

What’s the advantage over Ollama? I’ve tried a few different combinations of Nvidia cards and each time, they all got used to their full potential (it seemed)
To add to my lame noob answer, I found this, which has a better rundown of ollama vs llama.cpp. I don’t know if it’s considered bad form to link to ##ddit on lemmy, so I’ll just put the title here and you can search for it on there if you want. There are a couple informative posts which are upvoted. “There is a big difference between use LM-Studio, Ollama, LLama.cpp?”
Feel free to link anything you think is relevant.