Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a spot with your card, and nobody is charged until the cohort fills. Prices start at $5/mo for smaller models.

The LLMs are completely private (we don't log any traffic).

The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models.

https://sllm.cloud

sllm

Shared LLM access via cohort subscriptions

1. Is the given tok/s estimate for the total node throughput, or is it what you can realistically expect to get? Or is it the worst case scenario throughput if everyone starts to use it simultaneously?

2. What if I try to hog all resources of a node by running some large data processing and making multiple queries in parallel? What if I try to resell the access by charging per token?

Edit: sorry if this comment sounds overly critical. I think that pooling money with other developers to collectively rent a server for LLM inference is a really cool idea. I also thought about it, but haven't found a satisfactory answer to my question number 2, so I decided that it is infeasible in practice.

1. It's an average.
2. We have sophisticated rate limiter.