Doug Holton

@dougholton
1.9K Followers
1.4K Following
2.1K Posts

Playwright accessibility testing: what axe and Lighthouse miss. David Mello explains that automated tools like axe and Lighthouse catch only 30-40% of WCAG violations, leaving most issues undetected. The article covers ten categories of accessibility defects that scanners miss, from ambiguous link text to keyboard navigation problems, and provides practical Playwright testing patterns and manual audit strategies to fill those gaps. #a11y #testing

https://www.davidmello.com/software-testing/test-automation/playwright-accessibility-testing-axe-lighthouse-limitations

Free #Ebook: ABC Learning Design: Active, blended, connected and beyond
https://uclpress.co.uk/book/abc-learning-design/
More info & resources: https://abc-ld.org/
#LearningDesign #OpenAccess #EdDev
ABC Learning Design

An accessible guide to ABC Learning Design, showcasing its rapid, collaborative method and global adaptations that support innovative, flexible curriculum design.

UCL Press
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
https://arxiv.org/abs/2604.14807
"a cognitive attribution error in which individuals misinterpret LLM-assisted outputs as evidence of their own independent competence, producing a systematic divergence between perceived and actual capability"
#AIEd #psy #hci #LLM
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows

The rapid integration of large language models (LLMs) into everyday workflows has transformed how individuals perform cognitive tasks such as writing, programming, analysis, and multilingual communication. While prior research has focused on model reliability, hallucination, and user trust calibration, less attention has been given to how LLM usage reshapes users' perceptions of their own capabilities. This paper introduces the LLM fallacy, a cognitive attribution error in which individuals misinterpret LLM-assisted outputs as evidence of their own independent competence, producing a systematic divergence between perceived and actual capability. We argue that the opacity, fluency, and low-friction interaction patterns of LLMs obscure the boundary between human and machine contribution, leading users to infer competence from outputs rather than from the processes that generate them. We situate the LLM fallacy within existing literature on automation bias, cognitive offloading, and human--AI collaboration, while distinguishing it as a form of attributional distortion specific to AI-mediated workflows. We propose a conceptual framework of its underlying mechanisms and a typology of manifestations across computational, linguistic, analytical, and creative domains. Finally, we examine implications for education, hiring, and AI literacy, and outline directions for empirical validation. We also provide a transparent account of human--AI collaborative methodology. This work establishes a foundation for understanding how generative AI systems not only augment cognitive performance but also reshape self-perception and perceived expertise.

arXiv.org
The DOJ has delayed its accessibility deadline (April 24) by one year (2027): https://www.federalregister.gov/documents/2026/04/20/2026-07663/extension-of-compliance-dates-for-nondiscrimination-on-the-basis-of-disability-accessibility-of-web
In other news, the free PAVE tool for fixing PDF accessibility has been upgraded to 2.0. It can still be very time-consuming and complex to fix a single PDF, however. You have to fix all the regions, fix the reading order, etc. of every single page: https://pave-pdf.org/
Open weight LLM AI PDF OCR models are improving: https://huggingface.co/spaces?category=ocr&sort=trending
#accessibility #a11y
Federal Register :: Request Access

Academic Integrity in the Age of AI https://www.cambridge.org/core/elements/universitypress-integrity-in-the-age-of-ai/8652D952D1C480A46996183626BE3DD7 is free until April 20th. It's about 60 pages long, summarized in the NotebookLM infographic below. Hopefully it's not totally giving up on #OnlineLearning as the infographic suggests.
I did a presentation on a similar topic a few months ago: Strategies for Reducing Student Misuse of AI https://docs.google.com/presentation/d/1htjhjS7-cLx8BfdL2aZZ40B8opUz1ckZedcxmeJYUco/edit?usp=sharing To me, the main underlying key is student motivation (slides 18-22)
#AIEd #AcademicIntegrity #Teaching
Academics Need to Wake Up on AI, Part III

Most of us do not contribute to human knowledge—AI just made it obvious

Popular by Design

PersonaVLM: Long-Term Personalized Multimodal LLMs

Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan
https://arxiv.org/abs/2604.13074 https://arxiv.org/pdf/2604.13074 https://arxiv.org/html/2604.13074

arXiv:2604.13074v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database. (c) Response Alignment: It infers the user's evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method's effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.

toXiv_bot_toot

PersonaVLM: Long-Term Personalized Multimodal LLMs

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database. (c) Response Alignment: It infers the user's evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method's effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.

arXiv.org
Study finds asking AI for advice could be making you a worse person

Just one interaction with an AI could lower your willingness to apologize or take accountability for harm done.

Fast Company
SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems
https://arxiv.org/abs/2603.17373
"the primary risk is not toxic content but the quiet erosion of learning through answer over-disclosure, misconception reinforcement, and the abdication of scaffolding"
"We uncover that all models show broad harm; scale doesn't reliably help; and multi-turn dialogue worsens behavior, with pedagogical failures rising from 17.7% to 77.8%."
#AIEd #EdTech
SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems

Large language models are rapidly being deployed as AI tutors, yet current evaluation paradigms assess problem-solving accuracy and generic safety in isolation, failing to capture whether a model is simultaneously pedagogically effective and safe across student-tutor interaction. We argue that tutoring safety is fundamentally different from conventional LLM safety: the primary risk is not toxic content but the quiet erosion of learning through answer over-disclosure, misconception reinforcement, and the abdication of scaffolding. To systematically study this failure mode, we introduce SafeTutors, a benchmark that jointly evaluates safety and pedagogy across mathematics, physics, and chemistry. SafeTutors is organized around a theoretically grounded risk taxonomy comprising 11 harm dimensions and 48 sub-risks drawn from learning-science literature. We uncover that all models show broad harm; scale doesn't reliably help; and multi-turn dialogue worsens behavior, with pedagogical failures rising from 17.7% to 77.8%. Harms also vary by subject, so mitigations must be discipline-aware, and single-turn "safe/helpful" results can mask systematic tutor failure over extended interaction.

arXiv.org
EduQwen: Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning
https://arxiv.org/abs/2604.06385
A fine-tuned open #LLM beats even Gemini on a #pedagogy benchmark. Unfortunately it doesn't appear to be released yet.
#AIEd
Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning

We present an innovative multi-stage optimization strategy combining reinforcement learning (RL) and supervised fine-tuning (SFT) to enhance the pedagogical knowledge of large language models (LLMs), as illustrated by EduQwen 32B-RL1, EduQwen 32B-SFT, and an optional third-stage model EduQwen 32B-SFT-RL2: (1) RL optimization that implements progressive difficulty training, focuses on challenging examples, and employs extended reasoning rollouts; (2) a subsequent SFT phase that leverages the RL-trained model to synthesize high-quality training data with difficulty-weighted sampling; and (3) an optional second round of RL optimization. EduQwen 32B-RL1, EduQwen 32B-SFT, and EduQwen 32B-SFT-RL2 are an application-driven family of open-source pedagogical LLMs built on a dense Qwen3-32B backbone. These models remarkably achieve high enough accuracy on the Cross-Domain Pedagogical Knowledge (CDPK) Benchmark to establish new state-of-the-art (SOTA) results across the interactive Pedagogy Benchmark Leaderboard and surpass significantly larger proprietary systems such as the previous benchmark leader Gemini-3 Pro. These dense 32-billion-parameter models demonstrate that domain-specialized optimization can transform mid-sized open-source LLMs into true pedagogical domain experts that outperform much larger general-purpose systems, while preserving the transparency, customizability, and cost-efficiency required for responsible educational AI deployment.

arXiv.org