I had thought (naively) that was what was novel about DeepSeek. Ie. there was messaging that you could build the whole thing locally and train it locally too. But, that doesn’t seem to be the case?
@pkw you can train it locally, but you still need some pretty heavy hardware to do it. Looking at the weights they published, as I understand it, you need around 700 GB of disk space to store the models, and I don’t even know how much space the training data would take, so at least a few terabytes of free space. On top of that you need a few good GPUs to get it trained in a reasonable amount of time. So I would think it is possible for a small business to train a model using the DeepSeek algorithm, but they would need to invest a few hundred thousand euro/dollars in computing equipment. But it is a lot less than the amount of computing power that Google and Microsoft had been using to train their models.
Also, the DeepSeek company did not actually publish any code for training, they only published the model in their research paper, and they published the code for inference. So you would have to recreate that model using PyTorch or TensorFlow from the model description in the paper. I am sure someone will publish an open source training process at some point, but I can’t find one yet, and am probably not skilled enough to do it on my own (anyway I don’t feel like going through the trouble).