AI benchmarks are a bad joke – and LLM makers are the ones laughing

: Study finds many tests don't measure the right things

The Register