Hey everyone 👋

I’m diving deeper into running AI models locally—because, let’s be real, the cloud is just someone else’s computer, and I’d rather have full control over my setup. Renting server space is cheap and easy, but it doesn’t give me the hands-on freedom I’m craving.

So, I’m thinking about building my own AI server/workstation! I’ve been eyeing some used ThinkStations (like the P620) or even a server rack, depending on cost and value. But I’d love your advice!

My Goal:
Run larger LLMs locally on a budget-friendly but powerful setup. Since I don’t need gaming features (ray tracing, DLSS, etc.), I’m leaning toward used server GPUs that offer great performance for AI workloads.

Questions for the Community:
1. Does anyone have experience with these GPUs? Which one would you recommend for running larger LLMs locally?
2. Are there other budget-friendly server GPUs I might have missed that are great for AI workloads?
3. Any tips for building a cost-effective AI workstation? (Cooling, power supply, compatibility, etc.)
4. What’s your go-to setup for local AI inference? I’d love to hear about your experiences!

I’m all about balancing cost and performance, so any insights or recommendations are hugely appreciated.

Thanks in advance! 🙌

@[email protected] #AIServer #LocalAI #BudgetBuild #LLM #GPUAdvice #Homelab #AIHardware #DIYAI #ServerGPU #ThinkStation #UsedTech #AICommunity #OpenSourceAI #SelfHostedAI #TechAdvice #AIWorkstation #LocalAI #LLM #MachineLearning #AIResearch #FediverseAI #LinuxAI #AIBuild #DeepLearning #OpenSourceAI #ServerBuild #ThinkStation #BudgetAI #AIEdgeComputing #Questions #CommunityQuestions #HomeLab #HomeServer #Ailab #llmlab

@debby my advice, maybe you won’t love. In my own journey I found out that to run really big models, you need the biggest and expensive GPU, most models 20B+ need a lot of video ram, which either puts you in the I need 2+ beefy gpu and then at some point, fairly short, you will need to upgrade. So economically speaking makes no sense for 1-2 people usage.

What I settled on is in segmenting my LLMs, I installed litellm as main proxy and behind that I have two setups a local Ollama server with 1-7B models and for anything beyond that I am hiring cloud inference from hyperbolic. Which I find more economically it is all stitched back through open web up where the models centralized in lite llm are the ones to be selected in open web ui

With this, I figured I don’t fall to the 1-2 years GPU upgrade cycle which in my case, 2 people only it’s hard to justify the expense

@Prozak You're absolutely right—running 20B+ models locally can be quite costly. From a purely economic standpoint, your setup with LiteLLM + Ollama + cloud for heavy lifting makes the most sense for most people.

However, I still find myself drawn to the idea of experimenting with a local setup, even if it's not the most cost-effective choice. There's a certain appeal to tinkering with hardware and having full control over the system. It's not just about efficiency; it's about learning, autonomy, and the satisfaction of building something with your own hands. It's akin to building a custom PC just for the enjoyment of the process—sometimes, the journey itself is the reward!

Have you ever felt the urge to go fully local, even if just for the experience? Or are you firmly in the "hybrid is the best approach" camp?

#PassionProject #AILab #DIYTech #LocalAI #TechEnthusiast

@debby I have a full local pipeline for that urge. I understand what you mean and agree. However I did NOT buy a GPU, I am using my Mac Studio M1 MAX 32gb for the thinkering. Beyond 32B everything is too slow to have it as a “useful assistant” for the utilization needs, but to your point, my full end to end solution includes my offline only pipeline (even down to scikit, rag, etc playing) all local.

@Prozak @debby just want to mention, I run Ollama with 27B and 30B models on my MacBook Pro M1 Pro 32GB ram. It’s a 4 year old machine and it’s doing a really good job.

I’m satisfied with what it can do, and won’t be searching for anything else.

I really like that everything is local (and I know how much power it takes).