I keep hearing people saying that R1 is Open Source: "All the model code is open".
This is an absurd mischaracterization of what's happening. All the code is NOT open source, most importantly the training code is NOT open source.
What actually happened:
- Deepseek released a paper outlining how they built the model: https://arxiv.org/pdf/2501.12948. A lot of the claims made in it are dubious at best, pre-training and training steps are extremely opaque, and I haven't seen an independent researcher that's been able to reproduce the cost of training claims. After reading the paper, I don't doubt there was probably some degree of savings in the training process. But I wouldn't trust the figures blindly, China has a massive incentive to lie about these kinds of things.
- Deepseek published this repo, which basically just contains the paper: https://github.com/deepseek-ai/DeepSeek-R1
- Deepseek published model WEIGHTS in a hugging face repo: https://huggingface.co/deepseek-ai/DeepSeek-R1
I think it's very irresponsible to continue to spread this wide misconception that models like llama and Deepseek-R1 are Open Source. They're not, you won't be able to train it because you don't have the training code and a complete dataset, full stop.
What can you do with the weights? You can load them into a framework like PyTorch and run inference. But you have no idea how the weights were generated, which is maybe the most important part.
