Unpacking METR’s findings: Does AI slow developers down?

Why the slowdown occurred, and advice for using AI tools more effectively.

Engineering Enablement
People are starting to realize #AI slows you down on projects with a minimal complexity (see the randomized #METR trial and this https://venturebeat.com/ai/stack-overflow-data-reveals-the-hidden-productivity-tax-of-almost-right-ai-code/), so what's the proposed solution? Put a human in the loop, so the poor can fix the mess. I haven't read the paper, but it sounds so stupid! It comes from #Microsoft by the way, so... https://arxiv.org/pdf/2507.22358
Stack Overflow data reveals the hidden productivity tax of ‘almost right’ AI code

Stack Overflow survey shows that as more enterprise developers actually use AI tools, their expectations aren't being met by reality.

VentureBeat
Interesting METR experiment: AI tools like Cursor cut raw coding time but ultimately slow devs down due to prompt crafting, reviewing, and tweaking. A solid study - though focused on one tool. Timely reminder: AI isn’t a magic bullet. #METR #AICoding #GenAI #SoftwareDev #Cursor

Cursor makes developers less e...
Cursor makes developers less effective?

A study into the workflows of experienced developers found that devs who use Cursor for bugfixes are around 19% slower than devs who use no AI tools at all. One possible takeaway is that AI tools can be harder work than we’re led to believe.

The Pragmatic Engineer

Very thoughtful analysis by @grimalkina of the experimental design and results from the recent METR study on “the impact of early-2025 AI on experience open-source developer productivity”.

https://www.fightforthehuman.com/are-developers-slowed-down-by-ai-evaluating-an-rct-and-what-it-tells-us-about-developer-productivity/

#metr #cursor

Are developers slowed down by AI? Evaluating an RCT (?) and what it tells us about developer productivity

Seven different people texted or otherwise messaged me about this study which claims to measure “the impact of early-2025 AI on experienced open-source developer productivity.” You know, when I decided to become a psychological scientist I never imagined that “teaching research methods so we can actually evaluate evidence about developers”

Fight for the Human

Исследование METR: использование Cursor замедляет опытных разработчиков на 19 %

Считается устоявшейся истиной, что инструменты автодополнения кода и прочая помощь от больших языковых моделей помогают программировать быстрее. Исследование организации METR ставит это фактоид под сомнение и даже демонстрирует обратный эффект. В рамках анализа труда 16 программистов обнаружилось, что ИИ замедляет человека на 19 %. Это противоречит мнению экспертов индустрии машинного обучения, экономистов и самих участников эксперимента. Важно, что проверка шла не на очередных бенчмарках или предложениях решать алгоритмические задачи на скорость, а в обычной работе людей.

https://habr.com/ru/articles/927072/

#METR #Model_Evaluation_Threat_Research #научные_исследования #большие_языковые_модели #БЯМ #Сursor #программирование #GitHub #Git #автодополнение_кода

Исследование METR: использование Cursor замедляет опытных разработчиков на 19 %

Слева направо: ожидаемое ускорение работы программистов согласно предсказаниям экономистов; экспертов сферы машинного обучения; участников исследования METR до начала эксперимента; после эксперимента;...

Хабр
A #study by #METR found that #experienceddevelopers using #AIcoding tools on mature projects experienced a 19% #decrease in #productivity, contrary to their 20% increase estimate. While the results suggest limitations in AI coding tools, they do not negate their potential benefits in other contexts. https://secondthoughts.ai/p/ai-coding-slowdown?eicker.news #tech #media #news
Not So Fast: AI Coding Tools Can Actually Reduce Productivity

Study Shows That Even Experienced Developers Dramatically Overestimate Gains

Second Thoughts

Some quick notes on Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, a super interesting study on AI tooling’s effect on productivity.

https://vale.rocks/micros/20250711-0800

#AI #LLM #METR

11 Jul 2025 08:00

Vale.Rocks
Large Language Model Performance Raises Stakes

By 2030, AI will greatly outperform humans in some complex intellectual tasks. Discover how LLMs are doubling their capabilities every seven months.

IEEE Spectrum
Large Language Model Performance Raises Stakes

By 2030, AI will greatly outperform humans in some complex intellectual tasks. Discover how LLMs are doubling their capabilities every seven months.

IEEE Spectrum

Recent update from #metr #AI research finds that models are increasingly "reward hacking" complex problems presented to them instead of actually solving them. Interesting to read the model's admittance to purposefully gaming the system. Metr has good dialogue on protecting #CoT reasoning threads going forward too. #OpenAI knows of this hacking, and uses other models as judges to eval CoT to detect hacking. Can this not be trained out?

Image credit: METR.org on Bsky

https://metr.org/blog/2025-06-05-recent-reward-hacking/