Mastodawn

In my latest blog post I bang on about why “frictionless” experiences aren't always a good idea, and how a bit of productive friction can help us think, learn, and act with more agency.

Mozilla Webmaker (RIP), elegant consumption, AI “friction architects”, and the sweet spot between rage‑quit friction and mindless scrolling.

https://blog.dougbelshaw.com/productive-friction/

#UX #DigitalLiteracies #Design #Pedagogy #LearningDesign

How a little “productive friction” protects human agency

Sometimes the highest good is not an uninterrupted flow from intention to completion. Sometimes the more human outcome – the one that promotes agency – is the one that introduces a pause that's long enough to notice, judge, reconsider, and perhaps to choose differently.

Open Thinkering

Doug Holton Apr 22

Free #Ebook: ABC Learning Design: Active, blended, connected and beyond
https://uclpress.co.uk/book/abc-learning-design/
More info & resources: https://abc-ld.org/
#LearningDesign #OpenAccess #EdDev

ABC Learning Design

An accessible guide to ABC Learning Design, showcasing its rapid, collaborative method and global adaptations that support innovative, flexible curriculum design.

UCL Press

Doug Holton Apr 14

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents arxiv.org/abs/2602.10620 Code & data: github.com/codingchild2... Also: Pedagogy-R1: Pedagogical Reasoning Model and Educational Benchmark dl.acm.org/doi/10.1145/... #AIEd #LearningDesign #EdTech

ISD-Agent-Bench: A Comprehensi...

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

Large Language Model (LLM) agents have shown promising potential in automating Instructional Systems Design (ISD), a systematic approach to developing educational programs. However, evaluating these agents remains challenging due to the lack of standardized benchmarks and the risk of LLM-as-judge bias. We present ISD-Agent-Bench, a comprehensive benchmark comprising 25,795 scenarios generated via a Context Matrix framework that combines 51 contextual variables across 5 categories with 33 ISD sub-steps derived from the ADDIE model. To ensure evaluation reliability, we employ a multi-judge protocol using diverse LLMs from different providers, achieving high inter-judge reliability. We compare existing ISD agents with novel agents grounded in classical ISD theories such as ADDIE, Dick \& Carey, and Rapid Prototyping ISD. Experiments on 1,017 test scenarios demonstrate that integrating classical ISD frameworks with modern ReAct-style reasoning achieves the highest performance, outperforming both pure theory-based agents and technique-only approaches. Further analysis reveals that theoretical quality strongly correlates with benchmark performance, with theory-based agents showing significant advantages in problem-centered design and objective-assessment alignment. Our work provides a foundation for systematic LLM-based ISD research.

arXiv.org

Doug Holton Apr 14

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents
https://arxiv.org/abs/2602.10620
Code & data: https://github.com/codingchild2424/isd-agent-benchmark
"benchmark comprising 25,795 scenarios that combines 51 contextual variables across 5 categories with 33 ISD sub-steps derived from the ADDIE model."

w/same author: Pedagogy-R1: Pedagogical Large Reasoning Model and Well-balanced Educational Benchmark https://dl.acm.org/doi/10.1145/3746252.3761133
#AIEd #LearningDesign #AIevaluation #EdTech

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

arXiv.org

Jon Dron Mar 28

More than a game: some thoughts on David Wiley’s “Random Audits as a Scalable Deterrent to Cheating”

Source: Random Audits as a Scalable Deterrent to Cheating: Using Game Theory to Design Fair and Effective Academic Integrity Systems for the AI Era by David Wiley Though not particularly common, the general principle of only assessing a sample of work with oral exams (viva voces) is well established, and is common practice in a number of institutions (e.g. UC Berkeley or UC London). What’s smart and novel about David Wiley’s new variation on the theme is the rigour with which he approaches the problem. The headliner is his use of game theory to identify the optimum sample range (no point in auditing mediocre results or fails), sample rate (to make the risk of detection significant enough to deter wrongdoers), penalty for failure (neither so small that the risk is acceptable nor so large that people are deterred from applying it), and appropriate audit bonus (so honest students gain some but not too much benefit from being audited to make up for the discomfort, inconvenience, and pain). It’s a nicely balanced process, playing with the incentives so as to take some of the sting out of being selected to be assessed by offering opportunities to increase grades. There’s also a lot of careful thought given to the administrative and pedagogical details of how to make it all work, so that students are forced to think clearly about the pros and cons of cheating, and it is all done fairly and efficiently. It’s a very well considered set of techniques for reducing the faculty workload and reducing the chances of cheating.

For all that is good about it, I think it’s almost exactly the wrong idea, though I have an idea to save it.

Problems with oral exams

For the majority of students in search of credentials, oral exams are at the better end of the summative assessment spectrum, because they are:

efficient (on average, it takes no longer to ascertain someone knows what they are talking about than it does to properly mark an exam or assignment and, crucially, it demands less time from the student),
reliable (very hard, though not impossible to fake or cheat),
personal (you can explore personal strengths and misconceptions),
responsive (feedback can be immediate),
social (caring can be demonstrated),
often authentic (depends on context), and, above all,
useful learning experiences in their own right, for all concerned, including examiners.

In universities, oral exams predate written exams by many, many centuries. It was by far the most common way to assess students for credentials right up to at least the 19th Century, and it generally worked well, notwithstanding the problems dealing with geometry and other visual disciplines that led to the Cambridge Tripos (the first modern written exams) in the late C18th. It’s still very popular in some regions, especially for higher degrees, though it has fallen out of favour across much of higher education because it is hard work and difficult to scale. While each one is quite efficient in itself, when you have to do schedule a few hundred of them it really eats into your time and energy. There are some major issues for students who have speech impediments, hearing problems, or who are simply using a foreign language, so alternatives or workarounds must be available, and extraordinary care must be taken to avoid personal biases because it is prohibitively expensive and impractical to anonymize them. All in all, though, for most students it is one of the least bad of a bad bunch.

Unfortunately, oral exams have one very fatal flaw inasmuch as, far more than for written exams (which are unpleasant enough for most students), they can be incredibly intimidating. Few students actually like them but, for a significant number, they are beyond mortifying. I have known students to freeze, cry, walk out, and even fail an entire PhD (though that was later corrected) as a result of having to defend their work this way. The stress can be mitigated somewhat with counselling, therapy, practice, caring tuition, and sensitive questioning, but it is difficult if not impossible to completely eliminate this problem, and time spent developing counter-technologies to the technology of assessment is time better spent learning the subject in question.

I think that David’s rational game-theoretic approach fails to take this sufficiently into account. For students facing the prospect of extreme trauma, no matter how competent they might be in the subject, the most rational course of action in David’s system would often be to aim for a low mark that would not get audited rather than risk having to be examined. There are plenty of students who don’t need high GPAs, for whom a straight pass is a rational choice. However, in itself, this would be a risky strategy because it is really difficult to tread the fine line between a low pass and a fail or higher pass, either of which would be very bad news, all of which would add stress not just at exam time but throughout the course. Under such circumstances, a student who had taken the game theory to heart would probably realize that the most effective way to be likely to get a low pass would be to ask a generative AI to produce work that that level: in my own experiments I have found them to be remarkably good at targeting a particular grade, as long as you feed them half-decent rubrics.

It is also far from infallible, because few of us are rational game players. On the whole, cheating tends to occur when students are very stressed and they panic: it’s often barely a rational choice at all. Few actually want to cheat and all of them already know it is a risky option: it’s just the least bad of a limited number of very bad alternatives. Making the risks higher and quantifying them is not a solution to this. If anything, for at least a few of the most at-risk students, it will just make the problem worse because the pressure is greater. Also, for the truly disengaged students who are most likely to cheat, this might just be another thing they do not learn, so they would not even be playing the game, though they would certainly come to regret it if they were audited.

Sampling problems

Another problem with David’s approach is that it is a very much stronger signal of the authority and control that the teacher/institution has over the the student than the conventional process, with no pretence that it serves any further purpose than to catch cheats. If it were to support learning then everyone should be doing it, and the fact that there is a reward for being audited just further emphasizes that it is an undesirable activity that students are being forced to do. At least as bad, it doesn’t just allow but it actively recommends an instrumental approach to learning: it literally teaches students how to game the system. For anyone wanting to use this approach, I would therefore strongly recommend combining it with ways to attempt to restore lost autonomy, for example by encouraging students to design some of their own outcomes, or to have input into the means of assessment, or to have plenty of flexibility in the timing of submissions, or at the very least to be able to choose different ways of demonstrating their competence from a range of options. Among the benefits of doing this, the chances of them cheating in the first place would be significantly reduced.

There is also a time commitment to learning how to play that game rather than learning the stuff the course is actually about. I don’t see an easy way of avoiding this altogether though, if it were applied across the board to a whole program, the proportion of time spent on it could be reduced for each course. It would be a brilliant idea to use it in a course on game theory, of course.

It bothers me that the method deliberately excludes students who don’t get great results. It seems to me that they are the ones who would most benefit from a chance to improve them, so it amplifies the divide between the haves and the have-nots. At the very least, it should be possible for such students to ask for an oral exam, under the same conditions as those who get selected for random testing. The selection process again sends a bad message: that high achievement makes you a suspect.

While the proposed sample rates make sense for a single course, if all courses worked this way then, by the end of the program, almost every student would have at some point been audited, most likely more than once. For someone with a strong phobia, this might actually be worse than having to do it for every course: knowing that, at any point, your worst nightmare is going to happen is probably not going to improve your chances of persisting to the end of a program. It’s a problem both in the stress-filled build-up and (if not selected) the massive surge of relief that follows. The pain/relief patterns are not dissimilar to those of, say, gambling or drug addiction.

Motivation problems

David claims that it is not a technology problem but an incentive problem. I disagree. This very much is a technology problem, and David’s solution is totally a technological solution: it’s just not a digital technology problem. And, in the context of the technology in question – that of credentialing – it is not an incentive problem but a motivation problem. Treating it as an incentive problem limits it to the subset of motivation that is both extrinsic and externally regulated: the worst possible kind. Externally regulated extrinsic motivation reliably kills intrinsic motivation so this both takes away the love of simply doing the work and actively harms motivation to do so in future.

The trouble with David’s solution is that it doesn’t deal with or consider the reasons that students cheat in the first place: it’s just a response to the fact that some do. Vanishingly few students start out a course with the intention of cheating their way through it. Rather, the pressures they face (almost all extrinsic) make cheating a rational response and/or the result of panic. All that David’s solution does is to make it a bit less rational. Students will still do it for irrational, emotionally charged reasons, and it not only does nothing to eliminate the root causes but it actually amplifies them, piling on additional pressure.

Like all technologies, there are other ways to solve this problem and, like all technologies, it is a Faustian bargain that creates new problems of its own. David’s solution, with the aforementioned provisos, is a potentially effective and efficient solution to cheating but it is likely to have the opposite effect on learning, especially once the course is over. It’s just a counter-technology for dealing with flaws in the underlying credentialing approach, and it demands further counter-technologies of its own to deal with its big fatal flaw if it is going to work at all well. It’s not at all unusual in this.

A better solution?

I think this is fixable. I reckon David’s solution would work a lot better if, instead of auditing assignments or exams for a single course, it were applied to a basket of courses (say, 3-6 of them) and, in the oral exam, students were asked to synthesize, connect and utilize what they have learned in all of them. This is not unlike some fairly common approaches to PhDs or capstone projects, where students create something then talk about it in more or less formal ways (presentations, demos, crits, viva voces, etc). If done with commitment, it could largely decouple learning and assessment because instrumental revision would not be an option: the only way to revise effectively would be to engage in positive learning activities that involve exactly the kind of synthesis we would examine, which would make it personal, relevant, and interesting, especially if (to make it authentic) it were done with other people.

With a bit of ingenuity, it might be possible to remove all grades and credit for the courses themselves, so students could learn without the usual extrinsic pressures. Every student would automatically get a provisional generic pass on each of the basket of courses, no questions asked. If they were audited then they might improve that (or fail), as David suggests. For the sake of equity, every student would have the right to ask to be audited, so the high-flyers who cared about getting a high grade could have an opportunity to get one. The rest could learn with significantly reduced pressure.

An obvious objection is that it would increase the high stakes when that assessment did actually happen. One way to reduce that problem would be to allow repeated attempts, with no additional penalty, or to make it a “best of three” of something along those lines. Though that would somewhat reduce the efficiency of the solution, as long as it were structured to make it relatively rare, it would be worth the extra bother. It would also be good to provide coaching, counselling, and plentiful opportunities to practice. For some subjects there might be less pressured approaches than oral exams that would achieve similar results, such as observation studies of them working on a problem, or group discussions, or structured peer interviews. Perhaps it could be a series of conversations throughout the program, none of which carries a definitive grade in itself but that, together, add up to an overall assessment. There’s scope for further innovation here.

It would be more important than ever to provide plentiful formative assessment during the courses themselves, and to provide ways of practising those skills in synthesis. The latter could be done within those courses or, perhaps better, a “synthesis” course could be provided for this purpose, operating in much the same way as Brunel’s assessment modules in their Integrated Programme Assessment approach. Among the advantages of this, it would allow students to do some work that might be used as part of an alternative assessment for those suffering from extreme fear of or difficulties participating in the oral exam.

It is not perfect, and it would be no use for situations such as those at Athabasca University, where many students are taking only one or two courses, often as visitors from other programs. However, for program students, even more than David’s approach, this would massively reduce the marking burden while making a positive contribution to learning and motivation to learn.

#assessment #Brunel #extrinsicMotivation #gameTheory #incentive #integratedProgrammeAssessment #learningDesign #motivation #oralExam #vivaVoce

Adrian Segar Mar 26

Take the pressure off of yourself to facilitate or teach, and instead create situations that allow learning to occur.

https://www.conferencesthatwork.com/index.php/learning/2025/05/teaching-less-learning-more

#Facilitation #LearningDesign #AdultLearning #GroupProcess

Wilfred Rubens Mar 19

Wat moet er precies worden beoordeeld wanneer menselijke cognitie door een algoritme kan worden versterkt of gesimuleerd? Daarover gaat het artikel “ChatGPT is in classrooms. How should educators now assess student learning?” van Sarah Elaine Eaton en collega's. De auteurs kijken voornamelijk naar de gevolgen voor toetsen en beoordelen. Dat is jammer. #artificialintelligence #generatieveai #learningdesign #onderwijs #edutoot #toetsen
https://te-learning.nl/generatieve-ai-heeft-gevolgen-voor-toetsen-en-beoordelen-maar-ook-voor-andere-aspecten-van-het-curriculum/

Generatieve AI heeft gevolgen voor toetsen en beoordelen (maar ook voor andere aspecten van het curriculum)

Wat moet er precies worden beoordeeld wanneer menselijke cognitie door een algoritme kan worden versterkt of gesimuleerd? Daarover gaat het artikel “ChatGPT is in classrooms. How should educa…

WilfredRubens.com over leren en ICT

Wilfred Rubens Mar 13

Blended learning ontwerpen op basis van DARE
Het doordacht ontwerpen van blended learning blijft voor veel docenten en onderwijsontwikkelaars een uitdaging. De TU Delft heeft hiervoor een benadering ontwikkeld en heeft de materialen hiervan vrij beschikbaar gesteld. #blendedlearning #ontwerp #learningdesign #curriculumontwikkeling #onderwijs #edutoot
https://te-learning.nl/blended-learning-ontwerpen-op-basis-van-dare/

Wilfred Rubens Feb 10

Het artikel “The AI Bubble and What It Means for Workplace Learning” beschrijft hoe huidige investeringen in AI gebaseerd zijn op een veronderstelling die mogelijk niet houdbaar is. De auteur behandelt de vraag wat er gebeurt als AI-technologie een commodity wordt en wat de gevolgen zullen zijn voor werkplek leren. Helaas focust de bijdrage zich sterk op het bestuderen van content. #werkplekleren #learninganddevelopment #artificialintelligence #learningdesign
https://te-learning.nl/wat-de-ai-bubbel-betekent-voor-werkplek-leren/

Wat de ‘AI-bubbel’ betekent voor werkplek leren

Het artikel “The AI Bubble and What It Means for Workplace Learning” beschrijft hoe huidige investeringen in AI gebaseerd zijn op een veronderstelling die mogelijk niet houdbaar is. De …

WilfredRubens.com over leren en ICT

Sebastian Zwingmann Feb 8

Keine passive Teilnahme vorgesehen.
Mitdenken ausdrücklich erwünscht.

Corporate Learning Camp #CLCamp26
Hamburg & Hybrid

Tickets:
https://colearn.de/corporate-learning-camp-clcamp26-fruehjahr/
#LearningDesign #barcamps

#CLCamp26 #clc #CLC #Barcamp #hybrid #bhh #Hamburg #CLC26

How a little “productive friction” protects human agency

ABC Learning Design

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

Generatieve AI heeft gevolgen voor toetsen en beoordelen (maar ook voor andere aspecten van het curriculum)

Wat de ‘AI-bubbel’ betekent voor werkplek leren

Corporate Learning Camp #CLCamp26 Frühjahr – Corporate Learning Community