Mastodawn

Felicitas Pojtinger 🌅1d ago

Damn those Mythos benchmarks seem very promising

Show thread

Felicitas Pojtinger 🌅1d ago

Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

Show thread

Felicitas Pojtinger 🌅1d ago

Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

Show thread

Steven Deobald

@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034

Set up Nanobot with OpenRouter and Ollama

Set up Nanobot with OpenRouter and Ollama. GitHub Gist: instantly share code, notes, and snippets.

Gist

Show thread

Steven Deobald 1d ago

@pojntfx are you using nanobot for hacking or were you just pointing me to the provider section?

Show thread

Steven Deobald 1d ago

@pojntfx nod. it does have me thinking hard about other forms of baked-in safety. i'll admit this is the first point in my career where i've ever taken elixir seriously.

(well, ok, not really... @abnv ran a team at nilenso that did some amazing work with it for an quiz app that ran in parallel to a tv show. but i've never previously been tempted to learn it.)

Show thread

James Just James 1d ago

@pojntfx @deobald You found glm 5.1 was better than opus4.6 at coding?? Want to split an h200 ?