Mastodawn

Felicitas Pojtinger 🌅2d ago

Damn those Mythos benchmarks seem very promising

Show thread

Felicitas Pojtinger 🌅2d ago

Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

Show thread

Felicitas Pojtinger 🌅2d ago

Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

Show thread

Steven Deobald 1d ago

@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

Show thread

Steven Deobald

@pojntfx nod. it does have me thinking hard about other forms of baked-in safety. i'll admit this is the first point in my career where i've ever taken elixir seriously.

(well, ok, not really... @abnv ran a team at nilenso that did some amazing work with it for an quiz app that ran in parallel to a tv show. but i've never previously been tempted to learn it.)