Mastodawn

Felicitas Pojtinger 🌅1d ago

Damn those Mythos benchmarks seem very promising

Show thread

Felicitas Pojtinger 🌅1d ago

Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

Show thread

Felicitas Pojtinger 🌅

Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

Show thread

Justin 1d ago

@pojntfx I really don't get the excitement around tech that destroys the earth more than we as humanity have in our history so far?

Show thread

Felicitas Pojtinger 🌅1d ago

@justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now

This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it

Show thread

Justin 1d ago

@pojntfx AI has far more issues than just energy use.

Show thread

Felicitas Pojtinger 🌅1d ago

@justin Meh, the abolition of copyright is a nice side effect

Endless slop polluting clean datasources is a big problem, yes, but not using LLMs for something that is _not_ that won't change it

Show thread

Felicitas Pojtinger 🌅1d ago

@justin Something changed, either in the harness or the models idk but something changed ~Nov of last year, maybe ~Feb this year I'm not sure, but it's gone from "useless" to "useful" pretty quickly.

Show thread

Justin 1d ago

@pojntfx useful doesn't excuse theft, degradation of creativity and the amount of garbage that AI causes FOSS to deal with on a daily basis.

Show thread

Felicitas Pojtinger 🌅1d ago

@justin I don't believe in IP, there is no such thing as "theft" of intellectual "property". Copyleft was a means to get to this at some point and might still be a way to get there but times are changing

"garbage AI causes FOSS to deal with on a daily basis" - again, something changed here. It's not useless slop AI security reports anymore like a few months ago. systemd uses it, curl uses, Linux uses because it's useful

Show thread

Felicitas Pojtinger 🌅1d ago

@justin Degradation of creativity is a real problem, yes, but "why are you painting a picture of me when you can just take a photo" is nothing new

Show thread

Felicitas Pojtinger 🌅1d ago

@justin Idk this argument has been had like a million times on here and at this point it's getting tiring. It's useful in some contexts. Can be the opposite of that in others. It's being used by more and more projects and people every day with pretty good success lately.

Show thread

ori 18h ago

What's the fix for the people behind it explicitly having the goal of replacing the human mind as a tool of thought?

CC: @[email protected]

Show thread

ori 18h ago

We're racing to build hell because some people find the current level of warmth a little bit useful, and others think they'll get rich selling fuel for the furnaces.

CC: @[email protected] @[email protected]

Show thread

Felicitas Pojtinger 🌅10h ago

@ori @justin > What's the fix for the people behind it explicitly having the goal of replacing the human mind as a tool of thought?

I don't know tbh. They are not intelligent or sentient. I'm kind of hoping this is kind of self-evident by the quality of things produced w/o intention just being bad? Having the statistically most likely program to fix problem X isn't particularly interesting IMHO ...

Show thread

Felicitas Pojtinger 🌅10h ago

@ori @justin I know exactly one example of "mass" adoption of vibe coded software out in the wild atm (mise) except for ofc the tools arounds LLMs themselves. And that's only for dev-focused tooling. Not a single one that regular people use

Adoption of those tools in e.g. MS has been a bit of a disaster as is well known even by regular people at this point

Lots of vibe-coded e.g. Nextcloud clones out there now, and despite Nextcloud's UX being terrible people still prefer it over the clones

Show thread

Felicitas Pojtinger 🌅10h ago

@ori @justin Ultimately it's just a question of whether or not you've put the care and attention and actual labour into making something vs. if you haven't 🤷‍♀️ The same people that created Electron boilerplate slop TODO apps in the mid-2010s will continue to do the same thing in the mid-2020s. On the other hand, some Electron apps (esp. VSCode IMHO) are also pretty well liked because they work reasonably well, and I suspect the same will happen again in the future.

Show thread

ori 3h ago

My objection is not technical, but is about the people behind them and their goals.

However, in addition, I don't have a good definition of intelligence, and I don't believe believe that they need to be sentient in order to replace most uses of our brains. As the cryptographers say, "attacks only get better".

CC: @[email protected]

Show thread

Steven Deobald 1d ago

@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

Show thread

Felicitas Pojtinger 🌅1d ago

@deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034

Set up Nanobot with OpenRouter and Ollama

Set up Nanobot with OpenRouter and Ollama. GitHub Gist: instantly share code, notes, and snippets.

Gist

Show thread

Steven Deobald 18h ago

@pojntfx are you using nanobot for hacking or were you just pointing me to the provider section?

Show thread

Steven Deobald 18h ago

@pojntfx nod. it does have me thinking hard about other forms of baked-in safety. i'll admit this is the first point in my career where i've ever taken elixir seriously.

(well, ok, not really... @abnv ran a team at nilenso that did some amazing work with it for an quiz app that ran in parallel to a tv show. but i've never previously been tempted to learn it.)

Show thread

James Just James 18h ago

@pojntfx @deobald You found glm 5.1 was better than opus4.6 at coding?? Want to split an h200 ?