Mastodawn

OTelBench: AI struggles with simple SRE tasks (Opus 4.5 scores only 29%)
https://quesma.com/blog/introducing-otel-bench/
#ycombinator #benchmarking #opentelemetry #observability #llm #instrumentation #tracing

Benchmarking OpenTelemetry: Can AI trace your failed login? - Quesma Blog

A lot of vendors pitch AI SRE. We tested 14 models across 11 programming languages; even the best ones struggle with instrumenting code with the leading open-source standard, OpenTelemetry.

Quesma