Mastodawn

Poudlardo Mar 16

Leanstral: Open-source agent for trustworthy coding and formal proof engineering

Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37

https://mistral.ai/news/leanstral

Show thread

jasonjmcghee Mar 16

Curious if anyone else had the same reaction as me

This model is specifically trained on this task and significantly[1] underperforms opus.

Opus costs about 6x more.

Which seems... totally worth it based on the task at hand.

[1]: based on the total spread of tested models

Show thread

speedgoose Mar 17

But you can run this model for free on a common battery powered laptop sitting on your laps without cooking your legs.

Show thread

hobofan Mar 17

Sorry, but what are you talking about? This is a 120B-A6B model, which isn't runnable on any laptop except the most beefed up Macbooks, and then will certainly drain its battery and cook your legs.

Show thread

naasking Mar 17

You can easily run a quant of this on a DGX Spark though. Seems like a small investment if it meaningful improves Lean productivity.

Show thread

jasonjmcghee

Is it though?

Most people I know that use agents for building software and tried to switch to local development, every single time they switch back to Claude/codex.

It's just not worth it. The models are that much better and continue to get released / improve.

And it's much cheaper unless you're doing like 24/7 stuff.

Even on the $200/m plan, that's cheaper than buying a $3k dgx or $5k m4 max with enough ram.

Not to mention you can no longer use your laptop as a laptop as the power draw drains it - you'd need to host separately and connect

Show thread

naasking Mar 17

A single DGX Spark can service a whole department of mathematicians (or programmers), and you can cluster up to 4 of them them to fit very large models like GLM-5 and quants of Kimi K2.5. This is nearing frontier-level model size.

I understand the value proposition of the frontier cloud models, but we're not as far off from self-hosting as you think, and it's becoming more viable for domain-specific models.

Show thread

jasonjmcghee Mar 17

That's great news- I wonder if that will help drive cloud costs down too