I've seen a number of threads, blog posts, essays, etc., discussing the implications of Large Language Models such as the ChatGPT implementation of GPT3.5.

The worry is that these systems do a decent job at writing answers to fairly specific prompts, in that they bring together multiple elements to form a question. I've included an example below. If I asked a question like this on an exam, I'd give an answer like this full marks.

#teaching #gpt3 #highereducation

But I'm not all sure we're sunk just yet.

It seems to me is that what is happening is that AI systems are creeping their way up Bloom's Taxonomy (image under CC license from Vanderbilt University Center for Teaching).

With GPT3 and the likes, they've gone from being good at looking stuff up (level 1, remember) to being able *fake* (level 2, understanding.)

An aside: they don't actually *understand* anything, though this is a separate thread.

The other remarkable thing is the way that these AIs have leap-frogged to the top of Bloom's taxonomy. I think, though my mind could be changed, that it is accurate to say that they are able create original work (but not, mind you, understand what they've created.)

That seems scary because it gets around the sorts of prompts we might have used for online open book exams in the past.

Here I ask the AI to write a fable about a group of #shoebills who succumb to #GoodhartsLaw.

IMO, remarkable.

However, I've been asking GPT3 to answer my old exam questions and I'm finding that it does very poorly on most of them.

(A large fraction of them involve data graphics in some capacity, and so unfortunately I can't test these.)

Where it seems to be falling short is on analyzing and evaluating arguments.

Here, the AI knows the definition of mathiness (from my book no less) but #McKinsey Consulting fooled it with their silly trust question. It even doubles down and defends the equation!

This question, also about mathiness but from a previous year's exam, nicely illustrates what GPT3.5 is good at and what it fails at.

Here I asked the students to come up with a mathiness equation, and then to call BS on it.

The AI is very good at creating. It's mathiness-style formula for BS detection ability is pitch-perfect.

But it totally fails in its critique and misses the core point.

@ct_bergstrom This isn't my insight, but ChatGPT is *very* good at playing idiot savant, answering exam questions in a surprising number of academic fields, while giving up on plenty of other domains of question. For example, here is a question that asks to identify, compare, and synthesize themes in multiple ancient Greek texts:
@ct_bergstrom Of course, you realize after a while that ChatGPT can't innovate on the *form* of its argumentation. Example: the answer in this question, same in structure but about different topics, is identical in structure:
@ct_bergstrom I very much feel like I'm probing someone living with a neurological condition – something that affects memory, and subtly creativity. Think Oliver Sacks...