The AI industry doesn’t have good tools for measuring reliability, or even a good definition of reliability. @sayashk and @randomwalker seek to define reliability in a functionally useful way. https://www.normaltech.ai/p/new-paper-towards-a-science-of-ai
New Paper: Towards a science of AI agent reliability

Quantifying the capability-reliability gap

AI as Normal Technology