I've seen a number of threads, blog posts, essays, etc., discussing the implications of Large Language Models such as the ChatGPT implementation of GPT3.5.

The worry is that these systems do a decent job at writing answers to fairly specific prompts, in that they bring together multiple elements to form a question. I've included an example below. If I asked a question like this on an exam, I'd give an answer like this full marks.

#teaching #gpt3 #highereducation

But I'm not all sure we're sunk just yet.

It seems to me is that what is happening is that AI systems are creeping their way up Bloom's Taxonomy (image under CC license from Vanderbilt University Center for Teaching).

With GPT3 and the likes, they've gone from being good at looking stuff up (level 1, remember) to being able *fake* (level 2, understanding.)

An aside: they don't actually *understand* anything, though this is a separate thread.

@ct_bergstrom based my experiments with exam questions they are pretty good at faking 3 and 4. I asked GPT3 to come up with a hypothesis and design an experiment to test it based on some data, and it did better than 60% of our first year grad students.