I observed a third year undergrad class of a module that's teaching use of GenAI for the creative industries this afternoon. The briefing on the assessment.

It's the first time it's taught, so it's, uh.. finding its footing. But the students demand exemplars.

So the prof made some, but clearly did not invest the very effort expected of students. Then again she acknowledges the existence of the module is a performative bow to market pressures.

#AIED #HE #education

Effective #teaching is difficult, counter-intuitive, & not something you can master from the Internet. So it's not surprising that AI is pretty bad at it & bad at evaluating it: arxiv.org/abs/2603.00883 Podcast summary: drive.google.com/file/d/1n09D... More: mastodon.social/@dougholton/... #AIEd

Knowledge without Wisdom: Meas...
Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study evaluates the performance of leading foundation models (FMs, i.e., generative pre-trained base LLMs) with out-of-distribution (OOD) tasks of the teaching and learning of schoolchildren. Across all FMs, inter-model behaviors on disparate tasks correlate higher than they do with expert human behaviors on target tasks. These biases shared across LLMs are poorly aligned with downstream measures of teaching quality and often \textit{negatively aligned with learning outcomes}. Further, we find multi-model ensembles, both unanimous model voting and expert-weighting by benchmark performance, further exacerbate misalignment with learning. We measure that 50\% of the variation in misalignment error is shared across foundation models, suggesting that common pretraining accounts for much of the misalignment in these tasks. We demonstrate methods for robustly measuring alignment of complex tasks and provide unique insights into both educational applications of foundation models and to understanding limitations of models.

arXiv.org
* The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors
https://arxiv.org/abs/2603.00925
* Benchmarking the Pedagogical Knowledge of Large Language Models
https://arxiv.org/abs/2506.18710v1
https://www.fab-ai.org/initiatives/ai-for-education/edtech-quality/resources/benchmarks/about-the-pedagogy-benchmark
* AI‑generated lesson plans fall short on inspiring students and promoting critical thinking
https://theconversation.com/ai-generated-lesson-plans-fall-short-on-inspiring-students-and-promoting-critical-thinking-265355
#AIEd #mathed #teaching #education
The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors

Effective mathematics education requires identifying and responding to students' mistakes. For AI to support pedagogical applications, models must perform well across different levels of student proficiency. Our work provides an extensive, year-long snapshot of how 11 vision-language models (VLMs) perform on DrawEduMath, a QA benchmark involving real students' handwritten, hand-drawn responses to math problems. We find that models' weaknesses concentrate on a core component of math education: student error. All evaluated VLMs underperform when describing work from students who require more pedagogical help, and across all QA, they struggle the most on questions related to assessing student error. Thus, while VLMs may be optimized to be math problem solving experts, our results suggest that they require alternative development incentives to adequately support educational use cases.

arXiv.org
Effective #teaching is a difficult and counter-intuitive task, and it's not something you can master from the Internet. So it's not surprising that AI is pretty bad at it & bad at evaluating it - even negatively correlated with student learning. Another way of saying this is AI has poor pedagogical content knowledge:
* Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact https://arxiv.org/abs/2603.00883
Podcast summary: https://drive.google.com/file/d/1n09DUMTNoaJuuZzDnBlocZ52MWWyYgw4/view?usp=drivesdk
More examples:
#AIEd
Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study evaluates the performance of leading foundation models (FMs, i.e., generative pre-trained base LLMs) with out-of-distribution (OOD) tasks of the teaching and learning of schoolchildren. Across all FMs, inter-model behaviors on disparate tasks correlate higher than they do with expert human behaviors on target tasks. These biases shared across LLMs are poorly aligned with downstream measures of teaching quality and often \textit{negatively aligned with learning outcomes}. Further, we find multi-model ensembles, both unanimous model voting and expert-weighting by benchmark performance, further exacerbate misalignment with learning. We measure that 50\% of the variation in misalignment error is shared across foundation models, suggesting that common pretraining accounts for much of the misalignment in these tasks. We demonstrate methods for robustly measuring alignment of complex tasks and provide unique insights into both educational applications of foundation models and to understanding limitations of models.

arXiv.org
New post: GenAI makes it easy for students to skip the struggle, but productive struggle is where real learning happens. The students most likely to over-rely on it have the most to lose. What's the "minimum viable struggle" we need to protect? #ArtificialIntelligence #AIEducation #AIEd #AIInEd

What Happens to Expertise When...
What Happens to Expertise When Students Skip the Struggle?

When students use GenAI to skip the hard parts of learning, they miss the productive struggle that builds genuine expertise. This post explores why the "AI is just like a calculator" argument falls short, and why novices are most at risk of outsourcing the thinking that matters most.

Leon Furze

New post: GenAI makes it easy for students to skip the struggle, but productive struggle is where real learning happens. The students most likely to over-rely on it have the most to lose. What's the "minimum viable struggle" we need to protect? #ArtificialIntelligence #AIEducation #AIEd #AIInEd

https://leonfurze.com/2026/02/25/what-happens-to-expertise-when-students-skip-the-struggle/?utm_source=mastodon&utm_medium=jetpack_social

What Happens to Expertise When Students Skip the Struggle?

When students use GenAI to skip the hard parts of learning, they miss the productive struggle that builds genuine expertise. This post explores why the "AI is just like a calculator" argument falls short, and why novices are most at risk of outsourcing the thinking that matters most.

Leon Furze
How AI Is Exploding Our Illusions of Rigor
https://www.insidehighered.com/opinion/career-advice/teaching/2026/01/15/how-ai-exploding-our-illusions-rigor-opinion
"AI isn’t eroding rigor—it’s exposing where that rigor may have been more about appearances than substance.
There’s a real irony here: We’re so laser-focused on catching “cheaters” that we rarely stop and ask if our assignments still serve any clear purpose."
#AIEd
How AI Is Exploding Our Illusions of Rigor (opinion)

Craig E. Nelson’s concept of “dysfunctional illusions of rigor” holds new currency in our AI age.

Inside Higher Ed | Higher Education News, Events and Jobs
GenAI in education is a sprawling topic, so each January I try to distill it into a single post: what's changed, what's most important, and what you can actually do with the technology. This is 2026's introduction to GenAI: I'll dig deeper into each section throughout the year. #AI #AIedu #AIEd

Everything Educators Need to K...
Everything Educators Need to Know About GenAI in 2026

GenAI in education is a sprawling topic, so each January I try to distill it into a single post: what's changed, what's most important, and what you can actually do with the technology. This is 2026's introduction to GenAI: I'll dig deeper into each section throughout the year.

Leon Furze

GenAI in education is a sprawling topic, so each January I try to distill it into a single post: what's changed, what's most important, and what you can actually do with the technology. This is 2026's introduction to GenAI: I'll dig deeper into each section throughout the year. #AI #AIedu #AIEd

https://leonfurze.com/2026/01/15/everything-educators-need-to-know-about-genai-in-2026/

Everything Educators Need to Know About GenAI in 2026

GenAI in education is a sprawling topic, so each January I try to distill it into a single post: what's changed, what's most important, and what you can actually do with the technology. This is 2026's introduction to GenAI: I'll dig deeper into each section throughout the year.

Leon Furze
The #LMS at 30
https://ailearninsights.substack.com/p/will-the-lms-finally-deliver
"Despite substantial funding and near-universal adoption, the LMS has failed to improve teaching and learning in any meaningful or measurable way."
#EdTech #AIEd
Will the LMS Finally Deliver?

Matthew Pittinsky has been on the front lines of the LMS world for three decades, beginning in 1995.

AI-Learn Insights