Mastodawn

Felicitas Pojtinger 🌅1d ago

Damn those Mythos benchmarks seem very promising

Show thread

Felicitas Pojtinger 🌅1d ago

Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

Show thread

Felicitas Pojtinger 🌅1d ago

Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

Show thread

Steven Deobald 1d ago

@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

Show thread

James Just James

@pojntfx @deobald You found glm 5.1 was better than opus4.6 at coding?? Want to split an h200 ?