Composer 2 vorgestellt · Cursor

Programmieren auf Frontier-Niveau mit starken Ergebnissen auf CursorBench, höherer Token-Effizienz und einer schnelleren Standard-Variante.

Cursor

Beating Opus 4.6 and coming within striking distance of gpt-5.4 is impressive! Particularly given larger labs like Meta are struggling to catch up to OpenAI/Anthropic.

More competition among model vendors is great for developers!

Cursor is in a very tough situation right now. They don't have SOTA models (see the lack of benchmarks in the release), and they likely cannot subsidize usage through cheap subscriptions like claude code and openai do.

I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.

Are there other coding benchmarks we should include next time? We included Teminal-Bench 2.0 and SWE-bench Mulitilingual.

We don't plan on reporting SWE-bench Verified, for similar reasons to OpenAI: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

Why SWE-bench Verified no longer measures frontier coding capabilities

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.