Introducing WebAccessBench, a novel benchmark for AI language models to assess #accessibility quality and WCAG conformance in generated web interfaces under realistic prompting conditions.

I did a bit of research and found that LLMs are incredibly bad at basic digital accessibility tasks. You can compare models and read the full white paper at https://conesible.de/wab.

Overall data suggests massive implications for society at large, and major discrimination of people with disabilities. #a11y

@kc Thank you for sharing this work here on Mastodon! Is the benchmark source code available somewhere for reproducibility? It would be interesting to see the full list of tasks and the prompts (with and without guidance). Dankeschön!
@tomayac As the whitepaper describes, that’s not possible to keep the benchmark reliable for future model assessments. I will publish a small sample dataset soon.
@tomayac It's included with the white paper now :)
@kc Fantastic, thank you very much. For anyone else reading this: It's in Appendix A.