Evaluating Geekbench 6

Applications vary wildly in what they demand from a system, making it difficult for a single benchmark to provide a broadly representative score.

Chips and Cheese

Google for Developers (@googledevs)

Kaggle의 Benchmarks Resource Grant Program이 신청 마감 중간 시점에 도달했다. 선정된 프로젝트는 무료 컴퓨트 할당량, 인프라 지원, 기술 자문을 받아 AI 벤치마크 개발을 확장할 수 있다.

https://x.com/googledevs/status/2051648107767558266

#kaggle #benchmarks #compute #infrastructure #ai

Google for Developers (@googledevs) on X

Building the next generation of AI benchmarks? 📊 We’re officially at the midpoint of the application window for the @Kaggle Benchmarks Resource Grant Program! Selected projects receive complimentary compute quota, infrastructure support, and technical guidance to scale

X (formerly Twitter)

Dan McAteer (@daniel_mac8)

Anthropic 공동창업자 잭 클라크가 2028년까지 AI가 완전한 RSI(재귀적 자기개선)에 도달할 가능성을 60%로 본다고 언급했습니다. 여러 벤치마크에서 모델 성능이 급격히 향상되고 있다는 점을 근거로 들며, 향후 전개를 예측하기 매우 어려워질 수 있다고 봤습니다.

https://x.com/daniel_mac8/status/2051322445101990175

#anthropic #jackclark #rsi #ai #benchmarks

Dan McAteer (@daniel_mac8) on X

Jack Clark, Anthropic Co-Founder, now believes that AI will reach full RSI by 2028 with p(0.6). He believes it will lead to a nearly impossible to forecast future. Evidence to support his claim is the many benchmarks where models have exponentially improved. Does this imply

X (formerly Twitter)

swyx (@swyx)

ChatGPT 앱을 지우고 Codex를 더 유용한 대체재로 본다는 의견이다. 특히 xAI의 Grok 4.30이 가격 대비 가장 높은 지능 성능을 보인다고 언급하며 모델 성능 비교가 화제다.

https://x.com/swyx/status/2050391638166622222

#codex #grok #xai #llm #benchmarks

swyx 🇸🇬 (@swyx) on X

small milestone: uninstalled the chatgpt app. codex is strict superset now! found something cool - among frontier models, @xai @grok 4.30 is the most intelligence per dollar you can get, beating even open models like MiMo, Kimi, and DeepSeek. numbers pulled from

X (formerly Twitter)
Trump tells Congress the Iran war has ‘terminated’ as legal deadline hits

The letter to lawmakers attempts to justify why the president is not seeking congressional authorization after the conflict reaches a 60-day threshold.

POLITICO

⚠️ The Black Myth: Wukong hack has again exposed a dev lie, Denuvo cuts FPS.

Hacker voices38 released a bypass that runs without disabling Secure Boot and is steadier than the hypervisor method. Benchmarks: 239 FPS vs 182 FPS, a 24% boost. Clear takeaway: heavy verification systems eat CPU cycles; the optimized launch still leaves Denuvo present but gives players a real edge. WILD 😳

#SteamAndEpic #FPS #Denuvo #Benchmarks #Wukong #Hacker

lucas (@lucas_flatwhite)

AI 에이전트 시대를 살아가는 실행 원칙 10가지를 소개하며, 매일 쏟아지는 새로운 프레임워크·모델·벤치마크를 모두 따라갈 필요는 없다고 강조한다. 안드레 카파시의 ‘From Vibe Coding to Agentic Engineering’ 강연과 연결되는 메시지로, 에이전틱 엔지니어링 흐름을 짚는 내용이다.

https://x.com/lucas_flatwhite/status/2049884022168604703

#aiagents #frameworks #models #benchmarks #agenticengineering

lucas (@lucas_flatwhite) on X

2026년 AI 에이전트 시대를 살아가는 실행 원칙 10가지.. 너무 빠르게 변하고 있죠. 매일 새로운 프레임워크, 모델, 벤치마크가 등장하지만.. 모두를 따라갈 필요는 없어요. 이 아티클은 안드레 카파시의 최근 강연 "From Vibe Coding to Agentic Engineering"과 같이 연결되는 메시지가 많네요.

X (formerly Twitter)
Trump bets on quick Iran oil crunch. Experts see prolonged pain and rising costs.

Administration officials say Tehran is days from crisis, but analysts see a slower squeeze with global price shocks already hitting U.S. consumers and reshaping the political fight.

Politico
Introducing SOB: A Multi-Source Structured Output Benchmark for LLMs - Interfaze

A multi-source LLM benchmark across text, image, and audio that measures JSON value accuracy per field, not just schema compliance. 20+ models, 7 metrics, full leaderboard.

Interfaze
Severe Asthma: It’s Time to Act for Patients — and Health Systems

Timing transforms outcomes For people living with severe asthma, outcomes are not determined only by what treatment they receive, but when.1 Evidence now makes one thing clear: delays in care are c…

POLITICO