Finally tried gpt-5-mini on some tasks, hoping at "minimal" reasoning it would be faster for user-blocking inference than gpt-4.1-mini or claude-haiku-4.5.

Nope, it's a dog. So much so that I asked two other teams, and they said 5-mini is so slow and unreliable that they can't use it either. Apparently OpenAI's rating of "4 lightning bolts" means "takes 5000-10000 ms to process 200 tokens" so that's good to know 💫

Aside: we should normalize quoting LLM latency (especially time to first token) in milliseconds. A 60 fps interface updates every 16ms. A button needs to respond in about 150ms to feel responsive. It'll be a while before we get there, but that's the goal.
@apike glad to know it’s not just me. 5 mini routinely takes over 20 seconds on most responses. GPT-5 with medium reasoning takes 15-18 minutes to complete responding on some of our requests making it entirely useless for anything initiated by a user.