Introducing WebAccessBench, a novel benchmark for AI language models to assess #accessibility quality and WCAG conformance in generated web interfaces under realistic prompting conditions.

I did a bit of research and found that LLMs are incredibly bad at basic digital accessibility tasks. You can compare models and read the full white paper at https://conesible.de/wab.

Overall data suggests massive implications for society at large, and major discrimination of people with disabilities. #a11y

I have published a minor update to the white paper to get it ready for a wider audience, featuring a more in-depth introduction and a clearer explanation for why scoring is done the way it is: https://conesible.de/wab/
Accessibility Is Civil Rights. AI Must Stop Shipping Barriers.

@kc Thank you for sharing this work here on Mastodon! Is the benchmark source code available somewhere for reproducibility? It would be interesting to see the full list of tasks and the prompts (with and without guidance). Dankeschön!
@tomayac As the whitepaper describes, that’s not possible to keep the benchmark reliable for future model assessments. I will publish a small sample dataset soon.
@tomayac It's included with the white paper now :)
@kc Fantastic, thank you very much. For anyone else reading this: It's in Appendix A.

@kc Thank you Casey, for backing up my previously only "intuitive" assessment with hard facts.

I have already shared the link to the whitepaper several times today at work and I am looking forward to seeing the sample examples, as I am still rather a noob when it comes to prompting 😉

@ellianoa The sample prompt set is included in the white paper now 💚

@kc barriers and demand public timelines for repair. Hold your instituions

Should be institutions

@pixelate Fixed, thank you!
@kc You're welcome. I also put it on Hacker News where lots of the AI folks hang out, from Claude/Codex and such.