Hey everyone 👋

I’m diving deeper into running AI models locally—because, let’s be real, the cloud is just someone else’s computer, and I’d rather have full control over my setup. Renting server space is cheap and easy, but it doesn’t give me the hands-on freedom I’m craving.

So, I’m thinking about building my own AI server/workstation! I’ve been eyeing some used ThinkStations (like the P620) or even a server rack, depending on cost and value. But I’d love your advice!

My Goal:
Run larger LLMs locally on a budget-friendly but powerful setup. Since I don’t need gaming features (ray tracing, DLSS, etc.), I’m leaning toward used server GPUs that offer great performance for AI workloads.

Questions for the Community:
1. Does anyone have experience with these GPUs? Which one would you recommend for running larger LLMs locally?
2. Are there other budget-friendly server GPUs I might have missed that are great for AI workloads?
3. Any tips for building a cost-effective AI workstation? (Cooling, power supply, compatibility, etc.)
4. What’s your go-to setup for local AI inference? I’d love to hear about your experiences!

I’m all about balancing cost and performance, so any insights or recommendations are hugely appreciated.

Thanks in advance! 🙌

@[email protected] #AIServer #LocalAI #BudgetBuild #LLM #GPUAdvice #Homelab #AIHardware #DIYAI #ServerGPU #ThinkStation #UsedTech #AICommunity #OpenSourceAI #SelfHostedAI #TechAdvice #AIWorkstation #LocalAI #LLM #MachineLearning #AIResearch #FediverseAI #LinuxAI #AIBuild #DeepLearning #OpenSourceAI #ServerBuild #ThinkStation #BudgetAI #AIEdgeComputing #Questions #CommunityQuestions #HomeLab #HomeServer #Ailab #llmlab

Hoi iedereen! 👋
Vragen aan de community:

Heeft iemand ervaring met deze GPU’s? Welke zou je aanbevelen voor het lokaal draaien van grotere LLMs?
Zijn er andere budgetvriendelijke server-GPU’s die ik misschien heb gemist en die geweldig zijn voor AI-workloads?
Heb je tips voor het bouwen van een kosteneffectieve AI-workstation? (Koeling, voeding, compatibiliteit, enz.)
Wat is jouw favoriete setup voor lokale AI-inferentie? Ik zou graag over jullie ervaringen horen!

Alvast bedankt! 🙌"
#AIServer #LokaleAI #BudgetBuild #LLM #GPUAdvies #ThuisLab #AIHardware #DIYAI #ServerGPU #TweedehandsTech #AIGemeenschap #OpenSourceAI #ZelfGehosteAI #TechAdvies #AIWorkstation #MachineLeren #AIOnderzoek #FediverseAI #LinuxAI #AIBouw #DeepLearning #ServerBouw #BudgetAI #AIEdgeComputing #Vragen #CommunityVragen

@debby salut
go visit /r/localllama on reddit, they have plenty of advice and opinion

Salut à tous ! 👋
Questions pour la communauté :

Quelqu’un a-t-il de l’expérience avec ces GPU ? Lequel recommanderiez-vous pour exécuter des LLMs plus grands localement ?
Y a-t-il d’autres GPU serveurs économiques que j’aurais pu manquer et qui sont excellents pour les charges de travail IA ?
Avez-vous des conseils pour construire une station de travail IA rentable ? (Refroidissement, alimentation, compatibilité, etc.)
Quelle est votre configuration préférée pour l’inférence IA locale ? J’aimerais entendre vos expériences !

Merci d’avance ! 🙌

#ServeurIA #IALocale #MontageBudget #LLM #ConseilsGPU #LaboMaison #MatérielIA #IAFaitesVousMême #GPUServeur #TechOccasion #CommunautéIA #IAOpenSource #IAAutoHébergée #ConseilsTech #StationIA #ApprentissageAutomatique #RechercheIA #FediverseIA #IALinux #MontageIA #ApprentissageProfond #MontageServeur #IABudget #CalculEnPériphérieIA #Questions #QuestionsCommunauté

@debby Saluton ĉiuj 👋

Demandoj al la Komunumo:

Ĉu iu havas spertojn kun tiuj GPU-oj? Kiun vi rekomendus por loke funkciigi pli grandajn LLM-ojn?
Ĉu estas aliaj buĝetamikaj servilaj GPU-oj, kiuj eble fuĝis de mia atento, sed estas bonaj por AI-ŝarĝoj?
Ĉu vi havas konsilojn por konstrui malmultekostan AI-laborstacion? (Malvarmigo, energifonto, interkonektebleco, ktp.)
Kia estas via preferata agordo por loka AI-inferenco?

Dankon antaŭe! 🙌

#HejmaServilo #HejmaLabo #UzitaTek #AIKomunumo #Demandoj

@debby my advice, maybe you won’t love. In my own journey I found out that to run really big models, you need the biggest and expensive GPU, most models 20B+ need a lot of video ram, which either puts you in the I need 2+ beefy gpu and then at some point, fairly short, you will need to upgrade. So economically speaking makes no sense for 1-2 people usage.

What I settled on is in segmenting my LLMs, I installed litellm as main proxy and behind that I have two setups a local Ollama server with 1-7B models and for anything beyond that I am hiring cloud inference from hyperbolic. Which I find more economically it is all stitched back through open web up where the models centralized in lite llm are the ones to be selected in open web ui

With this, I figured I don’t fall to the 1-2 years GPU upgrade cycle which in my case, 2 people only it’s hard to justify the expense

@Prozak You're absolutely right—running 20B+ models locally can be quite costly. From a purely economic standpoint, your setup with LiteLLM + Ollama + cloud for heavy lifting makes the most sense for most people.

However, I still find myself drawn to the idea of experimenting with a local setup, even if it's not the most cost-effective choice. There's a certain appeal to tinkering with hardware and having full control over the system. It's not just about efficiency; it's about learning, autonomy, and the satisfaction of building something with your own hands. It's akin to building a custom PC just for the enjoyment of the process—sometimes, the journey itself is the reward!

Have you ever felt the urge to go fully local, even if just for the experience? Or are you firmly in the "hybrid is the best approach" camp?

#PassionProject #AILab #DIYTech #LocalAI #TechEnthusiast

@debby I have a full local pipeline for that urge. I understand what you mean and agree. However I did NOT buy a GPU, I am using my Mac Studio M1 MAX 32gb for the thinkering. Beyond 32B everything is too slow to have it as a “useful assistant” for the utilization needs, but to your point, my full end to end solution includes my offline only pipeline (even down to scikit, rag, etc playing) all local.

@Prozak @debby just want to mention, I run Ollama with 27B and 30B models on my MacBook Pro M1 Pro 32GB ram. It’s a 4 year old machine and it’s doing a really good job.

I’m satisfied with what it can do, and won’t be searching for anything else.

I really like that everything is local (and I know how much power it takes).

@debby I don't have any recommendation, but I'm also interested in your findings. What is your expected budget?
@a Great question!
My budget is still a bit flexible, but I’m aiming for a realistic range of $550 to $2,500+, depending on how ambitious I get. Since I already have RAM and an M.2 SSD on hand, I can focus the budget on the core components: a solid workstation base and a capable GPU.
@debby that is quite a range! 😆 I'd love to know what you end up buying. For me, the only real use of LLM (other than the infrequent grammar checking) is programming.

@a
Using LLMs mostly for programming? Sounds very reasonable! 💻✨ (I might not be reasonable? 😅)

For me, LLMs are like a Swiss Army knife—I use them for programming and debugging, sure, but also for voice typing and correcting my spelling and grammar (they save me daily!). Tools like Whisper AI are amazing for real-time transcription, but I’m still chasing the dream of local real-time translation—it feels so close but just out of reach with my current setup.

What I’d really love is to run bigger models locally—things like Intellect-2, Mistral-Large, or Llama 3.3—but most of these require 30GB+ of VRAM, which is a tough hurdle. I’d love to integrate my entire digital library and personal data into a local LLM—a truly private, personal AI assistant that understands my context without sending everything to the cloud.

I’m still in the discovery phase—figuring out what’s possible, but finding a sensible configuration is the real challenge.

Once I find my perfect setup, I’ll definitely share the build! 🛠️✨

@debby @[email protected] We can discuss this for a while.

We’re using RTX 4060 Ti 16GB with memory offloading to RAM, all running inside a VM with PCIe passthrough.

For everything related to LLMs, it’s not really an issue once the adjustments are made.

Some models can run without a GPU as long as there’s enough RAM, which is a good starting point for testing.

What kind of server model are you currently using?
@debby

I went the opposite direction. I got a cheap mini pc with as powerful an integrated graphics chip as I could find (pictured). I upgraded it to 64GB of RAM (the maximum it supports). Since the GPU shares system memory, I can run models that require 32GB or more of VRAM. The only disadvantage to this setup? It's *very* slow!
@debby
I just came across your toot on my own search for the right GPU(s) to buy https://chaos.social/@musevg/115288876536469616
I aim lower than you - budget-wise - and also want to avoid older architectures with lower compute capabilities: In the hope to be able to use the GPU(s) longer if future versions of Ollama need higher capabilities. This is why I don't have older models like V100/M40 on my list.
What are your thoughts / do you have any new insights?
M Schommer (@[email protected])

So. Jetzt. Bin ich zwar bei meiner Quest zur #Ollama #localAI etwas schlauer als vorher, aber dafür habe ich jetzt 3 (statt vorher 3) Optionen auf der Liste. Ratet mir bitte mal… was würdet ihr kaufen? 1. Zwei gebrauchte #RTX 2080 Ti mit 11GB von eBay für je 200-250€ 2. Zwei neue 3060 mit 12GB für je ~230€ 3. Eine neue 5060 Ti mit 16GB für ~420€ 4. Anderes, nämlich…?

chaos.social

@musevg First off, thanks for commenting on my GPU post – glad it caught your eye. I totally get the “budget-but-future-proof” vibe you’re after, especially with Ollama’s fast-moving roadmap.

Quick Takeaways

- CUDA-centric GPUs (Nvidia) still provide the smoothest LLM experience.
- Huawei’s Atlas line offers a lot of VRAM for the price and is worth considering if you’re okay with a bit of tinkering.

All three sit roughly around the $2 000 total mark when you add a modest CPU and chassis, which is pretty competitive compared with Nvidia’s high‑end cards.
Pre‑Built Alternative: Framework Max+ 395

If DIY feels like too much hassle, the Framework Max+ 395 (128 GB RAM, pre‑order $2 500) is a solid plug‑and‑play workstation. It’s not a GPU monster, but the massive RAM lets you offload some model parts to system memory, which can be handy for larger LLMs. Just keep in mind the shipping delay until December and the higher price tag for the convenience it offers.
My Current Game Plan (and Why)

Hold off until January – gives me time to see real‑world benchmarks from folks who’ve already run Atlas cards in their rigs.
Compare DIY vs. pre‑built – I’ll weigh the extra VRAM of an Atlas 300I Duo against the hassle‑free setup of a Framework box.
Gather more community feedback – the Chinese‑speaking tech forums have been surprisingly helpful, and I’m still learning a bit of Mandarin to decode the driver docs.

Bottom Line

If you need a lot of VRAM now and don’t mind a little driver‑tinkering, the Atlas 300I Duo is a compelling budget‑friendly choice.
If you prefer a hassle‑free experience and can wait for the pre‑order, the Framework Max+ 395 gives you a ready‑made workstation with plenty of RAM.
Nvidia still leads on raw performance, so if absolute speed is your top priority and the budget allows, a mid‑range RTX 3060/3070 combo remains a safe bet.

Hope this helps you cut through the noise! Let me know which direction you end up leaning toward—I’m curious to hear how it works out for you.Hey there!