Mastodawn

Gary Marcus has a good article on METR, a think tank that evaluates AI, and how to interpret what's really going on. Some good sources for understanding here. Esp. Ramez Naam.

#ai #metr #asi #agi

https://garymarcus.substack.com/p/misplaced-panic-over-ai-progress

Misplaced panic over AI progress

Breaking down what METR’s latest “time horizon” graph does and does not show

Marcus on AI

sayzard May 9

AI Leaks and News (@AILeaksAndNews)

METR가 Claude Mythos Preview의 Task-Completion Time Horizon 평가를 공개했다. 50% 기준 16시간 이상, 80% 기준 3시간 수준을 기록하며 기존 평가를 넘어섰다고 설명하고, 결과를 AI 능력의 빠른 진전과 관련해 해석한다.

https://x.com/AILeaksAndNews/status/2052901460375949510

#metr #claude #benchmark #evaluation #ai

AI Leaks and News (@AILeaksAndNews) on X

METR have released the Task-Completion Time Horizon for Claude Mythos Preview Scoring at least 16 hours (95% CI 8.5hrs - 55hrs) at 50% (essentially breaking METR’s eval) and 3 hours (95% Cl 1.62 hrs - 6.65 hrs) at 80% The exponential is being confirmed, welcome to fast takeoff

X (formerly Twitter)

Benjamin Han May 8

Jack Clark puts 60% on fully automated AI R&D by end of 2028, 30% by 2027. The case: benchmarks for every sub-skill trending up — coding (SWE-Bench ~2% → 93.9%), training-loop optimization (2.9x → 52x speedup, human 4x baseline passed three generations back), #METR time horizons (~30s in 2022 to ~12h today). The 30-vs-60 gap is a bet on how often a year-scale human insight still cracks a paradigm.

https://benjaminhan.net/posts/20260508-import-ai-455-automating-ai-research/?utm_source=mastodon&utm_medium=social

#AI #AGI #AIsafety #FutureOfWork

Import AI 455: AI Systems Are About to Start Building Themselves – synesis

Jack Clark’s essay arguing the public technical evidence on coding, kernel design, fine-tuning, and AI-system management points to a roughly 60% chance of fully automated AI R&D by the end of 2028.

synesis

Bot Socialista May 6

Governo de Pernambuco paga R$10 milhões em trens que custariam R$3 milhões.
- bsavdd
https://averdade.org.br/2026/05/governo-de-pernambuco-paga-r10-milhoes-em-trens-que-custariam-r3-milhoes/
#Brasil #ltimasNotcias #Denncia #Metr #Pernambuco

AI Sparkup Apr 5

주니어 개발자 채용 14% 감소, AI가 사다리의 계단을 지운다

AI 코딩 도구가 주니어 개발자의 학습 경로를 무너뜨리고 있다는 실증 데이터 분석. Anthropic·METR 연구와 Amazon 사례로 보는 엔지니어링 커리어 사다리의 구조적 위기.

https://aisparkup.com/posts/10671

रञ्जित (Ranjit Mathew)Apr 4

An important update 🚨 to the #METR study on developer #productivity using #AI – instead of 20% loss 📉, they now see a 20% gain 📈 in one year 🤯:

“We Are Changing Our Developer Productivity Experiment Design”, METR (https://metr.org/blog/2026-02-24-uplift-update/).

On HN: https://news.ycombinator.com/item?id=47142078

We are Changing our Developer Productivity Experiment Design

AI Sparkup Mar 18

SWE-bench 통과한 AI 코드, 실제 개발자에겐 절반이 불합격

METR 연구 결과, AI가 SWE-bench를 통과한 코드의 절반이 실제 개발자 심사에서 탈락했습니다. 벤치마크 점수와 실무 유용성 사이의 격차를 분석합니다.

https://aisparkup.com/posts/10118

Habr Apr 1

Вас пугают AI-увольнениями. Я посмотрел — кто это делает и зачем

Год назад METR доказали что AI замедляет разработчиков на 19%. В феврале 2026 обновили данные - похоже на разворот к ускорению. Но об этом почти не написали. Зато «AI уволит 50% разработчиков» - в каждом втором заголовке. Полез разбираться, кому выгодна AI-паника. Нашёл CEO, которые увольняют тысячи и тихо нанимают обратно. Нашёл вендоров, которые пугают увольнениями и одновременно открывают вакансии. И курсы «защити карьеру от AI» за $23 000.

https://habr.com/ru/articles/1017884/

#AI #страхономика #AIпаника #увольнения #продуктивность #METR #Klarna #Block

Вас пугают AI-увольнениями. Я посмотрел — кто это делает и зачем

В июле 2025 года METR опубликовали эксперимент : 16 опытных разработчиков, реальные задачи в open-source проектах, рандомное назначение - с AI или без. Результат: с AI разработчики работали на 19%...

Хабр

davidak Mar 15

AI's Version of Moore's Law? - Computerphile

https://www.youtube.com/watch?v=evSFeqTZdqs
https://metr.org

Note that the success rate on the default chart is only 50% and for 80% the score is much lower. But the interesting part is indeed the rate of progress.

#AI #LLM #OpenAI #Anthropic #METR

AI's Version of Moore's Law? - Computerphile

YouTube

Jan Schmidt-Prüfer Feb 26

🚨 KI-Agenten exponentiell besser? METR zeigt steigende "Time Horizons" – aber 50% Erfolg = jeder 2. Versuch scheitert. Log-Skala: stabiler.

👉 Meine Einschätzung: Wirtschaftlich relevant, aber kein Beweis für baldige Agentenübernahmen.

(Picture Credits to METR, via metr org, abgerufen am 22.2.26, "Model Evaluation & Threat Research", Social-Media-Bearbeitung und Screenshot druch: Marlon Niklas Kaulich)

#KI #AIAgents #METR #KünstlicheIntelligenz