It's been over a year, but DeepSeek finally did it and released their latest models. Performance, particularly for coding looks to be strong with 1M context, and despite being compute constrained, their team figured out some new attention methods to use during training.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

deepseek-ai/DeepSeek-V4-Pro · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.