⭐️ New blog post: A Month With OpenAI's Codex

https://highcaffeinecontent.com/blog/20260301-A-Month-With-OpenAIs-Codex

It's been literal *years* since I last posted anything, so you know this is a big deal for me 😜

A Month With OpenAI's Codex

High Caffeine Content
@stroughtonsmith Codex and Gemini are also seen as inferior for programming compared to Claude by many I trust to know. For this use case of porting they are probably as good. But in situations with more ambiguity or the user gives bad advice Claude is far superior from what I've seen.
@boxed that might have been true up to the release of 5.3 last month, I'm not convinced that's still true. But these things have a lot of subjectivity
Peter Gostev (@petergostev) on X

Link to the Repo: https://t.co/SkkvC6jcuf Link to the data viewer: https://t.co/b4q9uuJUhI

X (formerly Twitter)

@boxed @stroughtonsmith benchmarks of AI models are not everything (not saying, not important). It's as important as how agents manage context, system prompts, errors etc.

_random_agent_ + Opus 4.6 can be much worse, than _great_agent_ + Opus 4.6

@cleanbit @stroughtonsmith for sure. But this one I think shows a real issue for this generation of models and might be the reason why programmers prefer Claude while Gemini beats it in benchmarks. Getting to the answer when an answer exists is nice and all, but how you respond to a crisis is imo more important. This goes for people, nations, and models :)