Damn those Mythos benchmarks seem very promising
Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say
Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed
@pojntfx @deobald You found glm 5.1 was better than opus4.6 at coding?? Want to split an h200 ?