More than a game: some thoughts on David Wiley’s “Random Audits as a Scalable Deterrent to Cheating”
Source: Random Audits as a Scalable Deterrent to Cheating: Using Game Theory to Design Fair and Effective Academic Integrity Systems for the AI Era by David Wiley Though not particularly common, the general principle of only assessing a sample of work with oral exams (viva voces) is well established, and is common practice in a number of institutions (e.g. UC Berkeley or UC London). What’s smart and novel about David Wiley’s new variation on the theme is the rigour with which he approaches the problem. The headliner is his use of game theory to identify the optimum sample range (no point in auditing mediocre results or fails), sample rate (to make the risk of detection significant enough to deter wrongdoers), penalty for failure (neither so small that the risk is acceptable nor so large that people are deterred from applying it), and appropriate audit bonus (so honest students gain some but not too much benefit from being audited to make up for the discomfort, inconvenience, and pain). It’s a nicely balanced process, playing with the incentives so as to take some of the sting out of being selected to be assessed by offering opportunities to increase grades. There’s also a lot of careful thought given to the administrative and pedagogical details of how to make it all work, so that students are forced to think clearly about the pros and cons of cheating, and it is all done fairly and efficiently. It’s a very well considered set of techniques for reducing the faculty workload and reducing the chances of cheating.For all that is good about it, I think it’s almost exactly the wrong idea, though I have an idea to save it.
Problems with oral exams
For the majority of students in search of credentials, oral exams are at the better end of the summative assessment spectrum, because they are:- efficient (on average, it takes no longer to ascertain someone knows what they are talking about than it does to properly mark an exam or assignment and, crucially, it demands less time from the student),
- reliable (very hard, though not impossible to fake or cheat),
- personal (you can explore personal strengths and misconceptions),
- responsive (feedback can be immediate),
- social (caring can be demonstrated),
- often authentic (depends on context), and, above all,
- useful learning experiences in their own right, for all concerned, including examiners.
Unfortunately, oral exams have one very fatal flaw inasmuch as, far more than for written exams (which are unpleasant enough for most students), they can be incredibly intimidating. Few students actually like them but, for a significant number, they are beyond mortifying. I have known students to freeze, cry, walk out, and even fail an entire PhD (though that was later corrected) as a result of having to defend their work this way. The stress can be mitigated somewhat with counselling, therapy, practice, caring tuition, and sensitive questioning, but it is difficult if not impossible to completely eliminate this problem, and time spent developing counter-technologies to the technology of assessment is time better spent learning the subject in question.
I think that David’s rational game-theoretic approach fails to take this sufficiently into account. For students facing the prospect of extreme trauma, no matter how competent they might be in the subject, the most rational course of action in David’s system would often be to aim for a low mark that would not get audited rather than risk having to be examined. There are plenty of students who don’t need high GPAs, for whom a straight pass is a rational choice. However, in itself, this would be a risky strategy because it is really difficult to tread the fine line between a low pass and a fail or higher pass, either of which would be very bad news, all of which would add stress not just at exam time but throughout the course. Under such circumstances, a student who had taken the game theory to heart would probably realize that the most effective way to be likely to get a low pass would be to ask a generative AI to produce work that that level: in my own experiments I have found them to be remarkably good at targeting a particular grade, as long as you feed them half-decent rubrics.
It is also far from infallible, because few of us are rational game players. On the whole, cheating tends to occur when students are very stressed and they panic: it’s often barely a rational choice at all. Few actually want to cheat and all of them already know it is a risky option: it’s just the least bad of a limited number of very bad alternatives. Making the risks higher and quantifying them is not a solution to this. If anything, for at least a few of the most at-risk students, it will just make the problem worse because the pressure is greater. Also, for the truly disengaged students who are most likely to cheat, this might just be another thing they do not learn, so they would not even be playing the game, though they would certainly come to regret it if they were audited.
Sampling problems
Another problem with David’s approach is that it is a very much stronger signal of the authority and control that the teacher/institution has over the the student than the conventional process, with no pretence that it serves any further purpose than to catch cheats. If it were to support learning then everyone should be doing it, and the fact that there is a reward for being audited just further emphasizes that it is an undesirable activity that students are being forced to do. At least as bad, it doesn’t just allow but it actively recommends an instrumental approach to learning: it literally teaches students how to game the system. For anyone wanting to use this approach, I would therefore strongly recommend combining it with ways to attempt to restore lost autonomy, for example by encouraging students to design some of their own outcomes, or to have input into the means of assessment, or to have plenty of flexibility in the timing of submissions, or at the very least to be able to choose different ways of demonstrating their competence from a range of options. Among the benefits of doing this, the chances of them cheating in the first place would be significantly reduced.There is also a time commitment to learning how to play that game rather than learning the stuff the course is actually about. I don’t see an easy way of avoiding this altogether though, if it were applied across the board to a whole program, the proportion of time spent on it could be reduced for each course. It would be a brilliant idea to use it in a course on game theory, of course.
It bothers me that the method deliberately excludes students who don’t get great results. It seems to me that they are the ones who would most benefit from a chance to improve them, so it amplifies the divide between the haves and the have-nots. At the very least, it should be possible for such students to ask for an oral exam, under the same conditions as those who get selected for random testing. The selection process again sends a bad message: that high achievement makes you a suspect.
While the proposed sample rates make sense for a single course, if all courses worked this way then, by the end of the program, almost every student would have at some point been audited, most likely more than once. For someone with a strong phobia, this might actually be worse than having to do it for every course: knowing that, at any point, your worst nightmare is going to happen is probably not going to improve your chances of persisting to the end of a program. It’s a problem both in the stress-filled build-up and (if not selected) the massive surge of relief that follows. The pain/relief patterns are not dissimilar to those of, say, gambling or drug addiction.
Motivation problems
David claims that it is not a technology problem but an incentive problem. I disagree. This very much is a technology problem, and David’s solution is totally a technological solution: it’s just not a digital technology problem. And, in the context of the technology in question – that of credentialing – it is not an incentive problem but a motivation problem. Treating it as an incentive problem limits it to the subset of motivation that is both extrinsic and externally regulated: the worst possible kind. Externally regulated extrinsic motivation reliably kills intrinsic motivation so this both takes away the love of simply doing the work and actively harms motivation to do so in future.The trouble with David’s solution is that it doesn’t deal with or consider the reasons that students cheat in the first place: it’s just a response to the fact that some do. Vanishingly few students start out a course with the intention of cheating their way through it. Rather, the pressures they face (almost all extrinsic) make cheating a rational response and/or the result of panic. All that David’s solution does is to make it a bit less rational. Students will still do it for irrational, emotionally charged reasons, and it not only does nothing to eliminate the root causes but it actually amplifies them, piling on additional pressure.
Like all technologies, there are other ways to solve this problem and, like all technologies, it is a Faustian bargain that creates new problems of its own. David’s solution, with the aforementioned provisos, is a potentially effective and efficient solution to cheating but it is likely to have the opposite effect on learning, especially once the course is over. It’s just a counter-technology for dealing with flaws in the underlying credentialing approach, and it demands further counter-technologies of its own to deal with its big fatal flaw if it is going to work at all well. It’s not at all unusual in this.
A better solution?
I think this is fixable. I reckon David’s solution would work a lot better if, instead of auditing assignments or exams for a single course, it were applied to a basket of courses (say, 3-6 of them) and, in the oral exam, students were asked to synthesize, connect and utilize what they have learned in all of them. This is not unlike some fairly common approaches to PhDs or capstone projects, where students create something then talk about it in more or less formal ways (presentations, demos, crits, viva voces, etc). If done with commitment, it could largely decouple learning and assessment because instrumental revision would not be an option: the only way to revise effectively would be to engage in positive learning activities that involve exactly the kind of synthesis we would examine, which would make it personal, relevant, and interesting, especially if (to make it authentic) it were done with other people.With a bit of ingenuity, it might be possible to remove all grades and credit for the courses themselves, so students could learn without the usual extrinsic pressures. Every student would automatically get a provisional generic pass on each of the basket of courses, no questions asked. If they were audited then they might improve that (or fail), as David suggests. For the sake of equity, every student would have the right to ask to be audited, so the high-flyers who cared about getting a high grade could have an opportunity to get one. The rest could learn with significantly reduced pressure.
An obvious objection is that it would increase the high stakes when that assessment did actually happen. One way to reduce that problem would be to allow repeated attempts, with no additional penalty, or to make it a “best of three” of something along those lines. Though that would somewhat reduce the efficiency of the solution, as long as it were structured to make it relatively rare, it would be worth the extra bother. It would also be good to provide coaching, counselling, and plentiful opportunities to practice. For some subjects there might be less pressured approaches than oral exams that would achieve similar results, such as observation studies of them working on a problem, or group discussions, or structured peer interviews. Perhaps it could be a series of conversations throughout the program, none of which carries a definitive grade in itself but that, together, add up to an overall assessment. There’s scope for further innovation here.
It would be more important than ever to provide plentiful formative assessment during the courses themselves, and to provide ways of practising those skills in synthesis. The latter could be done within those courses or, perhaps better, a “synthesis” course could be provided for this purpose, operating in much the same way as Brunel’s assessment modules in their Integrated Programme Assessment approach. Among the advantages of this, it would allow students to do some work that might be used as part of an alternative assessment for those suffering from extreme fear of or difficulties participating in the oral exam.
It is not perfect, and it would be no use for situations such as those at Athabasca University, where many students are taking only one or two courses, often as visitors from other programs. However, for program students, even more than David’s approach, this would massively reduce the marking burden while making a positive contribution to learning and motivation to learn.
#assessment #Brunel #extrinsicMotivation #gameTheory #incentive #integratedProgrammeAssessment #learningDesign #motivation #oralExam #vivaVoce


