Mastodawn

Composer 2

Composer 2 vorgestellt · Cursor

Programmieren auf Frontier-Niveau mit starken Ergebnissen auf CursorBench, höherer Token-Effizienz und einer schnelleren Standard-Variante.

Cursor

Show thread

granzymes Mar 19

Beating Opus 4.6 and coming within striking distance of gpt-5.4 is impressive! Particularly given larger labs like Meta are struggling to catch up to OpenAI/Anthropic.

More competition among model vendors is great for developers!

Show thread

ManuelSuarez Mar 19

Cursor is in a very tough situation right now. They don't have SOTA models (see the lack of benchmarks in the release), and they likely cannot subsidize usage through cheap subscriptions like claude code and openai do.

I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.

Show thread

leerob Mar 19

Are there other coding benchmarks we should include next time? We included Teminal-Bench 2.0 and SWE-bench Mulitilingual.

We don't plan on reporting SWE-bench Verified, for similar reasons to OpenAI: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

Why SWE-bench Verified no longer measures frontier coding capabilities

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.