Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22% - Quesma Blog

We expected small models to be fast, but our benchmarks revealed a common reliability trap. Here’s our deep dive on finding and fixing it.

Quesma Blog