Mastodawn

Curious about downloading #AI model weights

We usually download these pre-trained models from sites like HuggingFace GGUF, Ollama, or vLLM. The companies that build these things tell you if you want to use their models locally, just download their apps or Python scripts, then run a command and it pulls the models for you. Great, nice and easy.

But what if the servers are down, or being blocked by the government or something? Is there anyone out there BitTorrent-ing model weights and paramaters like #DeepSeekR1 ?

#tech #AI #DeepSeek #LLM #BitTorrent #ModelWeights

Show thread

Paul SomeoneElse Feb 6, 2025

I had thought (naively) that was what was novel about DeepSeek.
Ie. there was messaging that you could build the whole thing locally
and train it locally too.
But, that doesn't seem to be the case?

Show thread

Ramin Honary Feb 6, 2025

I had thought (naively) that was what was novel about DeepSeek. Ie. there was messaging that you could build the whole thing locally and train it locally too. But, that doesn’t seem to be the case?

@pkw you can train it locally, but you still need some pretty heavy hardware to do it. Looking at the weights they published, as I understand it, you need around 700 GB of disk space to store the models, and I don’t even know how much space the training data would take, so at least a few terabytes of free space. On top of that you need a few good GPUs to get it trained in a reasonable amount of time. So I would think it is possible for a small business to train a model using the DeepSeek algorithm, but they would need to invest a few hundred thousand euro/dollars in computing equipment. But it is a lot less than the amount of computing power that Google and Microsoft had been using to train their models.

Also, the DeepSeek company did not actually publish any code for training, they only published the model in their research paper, and they published the code for inference. So you would have to recreate that model using PyTorch or TensorFlow from the model description in the paper. I am sure someone will publish an open source training process at some point, but I can’t find one yet, and am probably not skilled enough to do it on my own (anyway I don’t feel like going through the trouble).

Show thread

Paul SomeoneElse Feb 6, 2025

Thanks for the breakdown!