Good point.
EU study warns over the shortcomings of AI benchmarking. Paper by EU researchers highlights problems with how AI models are currently measured and urges regulators to signal which benchmarks are trustworthy
"Measuring AI capabilities and risks is a challenge, and benchmarks have been found to promise too much, be easily gamed, and measure the wrong thing"
https://www.euractiv.com/section/tech/news/eu-study-warns-over-the-shortcomings-of-ai-benchmarking/?utm_source=mastodon&utm_medium=dlvr.it
#AI #benchmarking #benchmarks
is OpenBSD 10x faster than Linux? (tedu@)

#Business #Analyses
How much to spend on accessibility? · “Non-compliance is far more expensive than compliance.” https://ilo.im/1665ih

_____
#Accessibility #Companies #Strategy #Benchmarks #Investment #Cost #Compliance #Regulations #Lawsuits

How much should you spend on accessibility? - Karl Groves

Some recent discussions in the Accessibility Slack, as well as with some customers, inspired me to do some research on what you should spend on digital accessibility. In a business environment increasingly shaped by regulation, risk, and reputation, the question “How much should you spend on digital accessibility?” is more than a budgeting decision—it’s a

Karl Groves - Web Accessibility Viking

Evaluating LLMs on creative writing via reader usage, not benchmarks

https://www.narrator.sh/

#HackerNews #EvaluatingLLMs #CreativeWriting #ReaderUsage #Benchmarks #NarrativeAI

the future of reading

narrator uses ai to write exactly what you want to read.

narrator

The briefing also features perspectives from:
👤 Prof. Dr. Chris Biemann, Universität Hamburg
👤 Dr. Paul Röttger, MilaNLP Group, Università Bocconi

All experts stress that strong benchmark results do not automatically translate into reliable performance in real-world applications.

📄 Read the full statements here: (in German)
https://www.sciencemediacenter.de/angebote/gpt-5-veroeffentlicht-wie-gut-messen-benchmarks-leistung-von-ki-modellen-25127

(3/3)

#AI #Benchmarks #NLP #AIresearch #MachineLearning

🔍 Actualización de HardInfo2 en antiX 🐧

📦 `hardinfo2`
🆙 `2.2.7-1~bpo12+1` ➡️ `2.2.10-1~bpo12+1`

📋 ¿Qué es?
HardInfo2 es una herramienta gráfica para ver información detallada de tu sistema: CPU, memoria, discos, red, sensores, y también hacer pruebas de rendimiento.

#antiX #Linux #Hardware #Benchmarks #NoSystemD #Actualizaciones 📈💻

The proof that #benchmarks on #LLM models are utterly useless.

Maybe it's time to focus on real-world performance and practical applications instead of chasing numbers?

#llm #ai #aibenchmarks #llmbenchmark #machinelearning #artificialintelligence #openai #gpt5 #chatgpt

Desperate measures to save Intel: US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief for Taiwan

A new report out of Taiwan has revealed that the current US administration is tying the reduction on trade of trade tariffs on Taiwan to significant TSMC investment in the US. This investment includes a 49% stake in Intel.

Notebookcheck
Das Semester ist zu Ende, das BEAST-Praktikum auch: Sechs Teams aus Studierenden haben Beschleuniger und #Prozessoren ausgereizt, #Benchmarks überlegt. Am Ende bewerten sie sich gegenseitig – und die Sieger bekommen einen LRZ-Bierseidel. Wir gratulieren H.Boeving und D. Soukup und hoffen, der Rest der Teilnehmenden nimmt Bleibendes mit vom BEAST-Praktikum.
Es findet nächstes Jahr im Sommersemester statt: Für die LMU: https://tiny.badw.de/r7T58O
Für die @tu_muenchen https://tiny.badw.de/2C8Ys6