Mastodawn

Introducing WebAccessBench, a novel benchmark for AI language models to assess #accessibility quality and WCAG conformance in generated web interfaces under realistic prompting conditions.

I did a bit of research and found that LLMs are incredibly bad at basic digital accessibility tasks. You can compare models and read the full white paper at https://conesible.de/wab.

Overall data suggests massive implications for society at large, and major discrimination of people with disabilities. #a11y

Show thread

Thomas Steiner

Feb 23

@kc Thank you for sharing this work here on Mastodon! Is the benchmark source code available somewhere for reproducibility? It would be interesting to see the full list of tasks and the prompts (with and without guidance). Dankeschön!

Show thread

Casey Feb 23

@tomayac As the whitepaper describes, that’s not possible to keep the benchmark reliable for future model assessments. I will publish a small sample dataset soon.

Show thread

Casey

@tomayac It's included with the white paper now :)

Show thread

Thomas Steiner

Feb 23

@kc Fantastic, thank you very much. For anyone else reading this: It's in Appendix A.