Based on the current DeepSeek website I suspect it's not going to be great as their current model (V3.4? V4-mini?) often forgets or changes facts explicitly mentioned in the conversation which R1 never did. It's better than R1 at math or coding, but nearly unusable for deep conversation. I suspect they pushed MLA or linear attention too much, or quantize a lot more than before.
Yeah, Qwen3 coder for Claude Code and 3.5 for OpenClaw replaced my full-stack use of Opus 4.6 already; it's fine for basic web apps, k8s/docker infra setup, optimizing AI models etc. with only slightly higher error rate than Opus. Upcoming 3.6 together with Gemma4 might make it even better (still to test). OpenAI's memory spot market play might have been directed at local inference as well.

GitHub - JuliusBrussee/caveman: 🪨 why use many token when few token do trick — Claude Code skill that cuts 75% of tokens by talking like caveman
🪨 why use many token when few token do trick — Claude Code skill that cuts 75% of tokens by talking like caveman - JuliusBrussee/caveman
GitHub