Mastodawn

Hacker News Sep 17, 2025

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-Mini by 22%

https://quesma.com/blog/tau2-benchmark-improving-results-smaller-models/

#HackerNews #Tau2Benchmark #GPT5Mini #AIResearch #ModelImprovement #PromptEngineering

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22% - Quesma Blog

We expected small models to be fast, but our benchmarks revealed a common reliability trap. Here’s our deep dive on finding and fixing it.

Quesma Blog