OpenAI released a tool which purports to detect AI-generated text. At the highest end of detection, it labels text "possibly" or "likely" AI-generated. 21% of human-written text falls under "possibly" and 9% of "likely" does. That's 3 in 10 students being defamed and/or harmed.

Just like other providers of academic surveillance software, OpenAI states that their detector "should not be used as a primary decision-making tool".

It will be, though. And harm will follow.

https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text/

New AI classifier for indicating AI-written text

We’re launching a classifier trained to distinguish between AI-written and human-written text.

@Linkletter
"Having failed the Turing test you will be placed on academic probation for the remainder of the semester. If you wish to appeal this action please use the form linked below to make your case to the Dean's arbitration committee."
@Linkletter note that you have to log in to use the detector
@Autumm Oh no. So students will check their own work using this tool and get flagged by whatever network and WiFi snooping tech schools are using to prove they used ChatGPT.

@Linkletter Yeah it is just a lot of tracking. Log-ins make me nervous.

Oh and in terms of faculty uploading student work can we talk about this little bit of small print.

@Autumm @Linkletter And every time a student uses it - perhaps required by their instructor - their own work adds value to the machine #SurveillanceCapitalism

@JamesGG @Linkletter

Why do I feel like I've heard this one before?

Oh right... because I have

So. Many. Times.

@Autumm @Linkletter I bet there’s a sticky note somewhere at HQ that says “buy and close TurnItIn”

@Autumm @Linkletter

I remember there was a big stink in one of my comp sci classes. A student submitted work with a header that specifically said they didn't consent to the work being shared beyond the audience necessary to mark it, and the prof thought it was a really good example of a correct solution and put it up on screen to go through the solution for the whole (small) class.

At the end of it all, the prof still didn't think they'd done anything wrong.

@Linkletter The concept is totally unworkable. If there end up being a dozen popular chatbots of this type on the web, are we supposed to check every one of them?
It shouldn't actually be an issue for school writing assignments, because these systems are inherently nothing but bullshit generators. Their (raw) output should be an F paper because it says wrong stuff and doesn't make sense.
The big issue is likely to be stuff like crapflooding of facebook with bot-generated disinformation.
@ben_crowell_fullerton @Linkletter I wish it were this easy but it's not so. Chat GPT is quite good at generating introductory writing that would earn B's in many courses
@bwyble @Linkletter "Chat GPT is quite good at generating introductory writing that would earn B's in many courses"
Note my words "raw" and "should." The raw output is illogical and self-contradictory, and when asked to cite sources, it seems to fabricate 100% of them. That should be an F.
But many teachers may be OK with platitudinal drivel and may not demand any facts or logic. And some students will painstakingly massage the chatbot output into something better.

@ben_crowell_fullerton @Linkletter When you ask it for basic, middle of the road writing about common topics like the Stroop effect in psychology, or symbolism in Moby Dick, the writing is totally fine. And in such assignments the references aren't really an issue.

I see so many academics dismissing chatgpt because it can't handle more advanced or specialty topics (like transgender in Turkey), but how do we think students get to the point of being able to approach such work in the first place? Students have to go through fundamentals of writing just like they have to learn addition before they can understand algebra or calculus.

There are countless of our colleagues in middle school, high school and college who painstakingly cultivate the bedrock of writing skills that we rely on. Let's not take that work for granted by telling them that they don't have to be concerned about ChatGPT.

@bwyble @Linkletter "I see so many academics dismissing chatgpt because it can't handle more advanced or specialty topics (like transgender in Turkey[...]"
That's interesting, thanks for your reply. I find it very surprising, though, because I don't consider that a hard topic at all. It seems like a softball assignment that can be responded to simply by rehashing some platitudes.
I tried to express myself more fully in a blog post:
https://ben-crowell.bitbucket.io/red_pen/
https://toot.community/@ben_crowell_fullerton/109790520725073554
index

@bwyble @Linkletter
"There are countless of our colleagues in middle school, high school and college who painstakingly cultivate the bedrock of writing skills that we rely on."
Yes, this is a good point. I'm coming at it from my perspective as a retired college professor. But:
(1) Your analogy with arithmetic is a good one. College isn't grade school.
(2) In a K-8 basic grammar-type class, there are many opportunities to have students write in class without access to ChatGPT.

@ben_crowell_fullerton @Linkletter

College isn't grade school, but there are a lot of students who still need to build fundamental writing skills in college. And ChatGPT is still a problem that high school teachers will have to grapple with even if it's not OUR problem at the college level.

@bwyble @Linkletter "College isn't grade school, but there are a lot of students who still need to build fundamental writing skills in college."
Agreed. It would be great if community colleges had better ways of helping students who need remediation in math or English. Remediation in math, AFAICT, is a complete failure, and Calif. is lurching toward eliminating it.
But the solution isn't to blur the boundaries between college and pre-college education. CC students don't deserve glass ceilings.

@ben_crowell_fullerton @Linkletter

I guess that I don't view needing help with rhetorical skills in college as remedial, any more than students who haven't mastered calculus as a freshman need remedial help. Writing well is a complex skillset with a long on-ramp that slopes gradually through the entire lifespan. I think a huge fraction of first year's are at a learning stage where they can benefit from basic writing instruction and could be derailed by learning to rely on tools to do it for them.

@ben_crowell_fullerton @bwyble @Linkletter Exactly, if the exam happens in person on-site. No problem.

With all “online” exams, your problems are not ChatGPT.

Put bluntly if you are passing a mass of students so that you do not know your students without an ID check, you do have a problem.

Think: Webcam, keyboard, monitor are all externally connected to the PC. How do you make sure that the exam taker is the person your proctor sees?

@ben_crowell_fullerton @bwyble @Linkletter Just to wet your appetite and a person on the other side of the wall? (Yes cables can pass a wall if you make a hole into them, it's a great way to play PlayStation games in the home office, with the PS in the living room)

Webcams are standard USB devices, mostly. How do you make sure that the Webcam is actually really showing reality and not a modified image to the PC?

@yacc143 @ben_crowell_fullerton @Linkletter Professors are very aware of the many ways that students could potentially cheat on exams, online or in person.

But the real issue with ChatGPT is not how students take exams, because that's not usually how writing is taught. Rather they are asked to complete assignments outside of class hours.

@bwyble @ben_crowell_fullerton @Linkletter Again if you are mass handling students, 🤷

I had at least once in the family a cheater caught because the teacher knew him well, and the writing/language style of the handed in essay was not his, and hence he was kindly asked orally about the content of the essay, oops. (we warned the idiot of handing it in without even reading it, hubris before the fall, right?)

@yacc143 @ben_crowell_fullerton @Linkletter

Yea, this is probably one of the potential remedies, something to put in the syllabus.

@bwyble @ben_crowell_fullerton @Linkletter Sure, and ChatGPT luckily for as fails at addition and subtraction.

It will argue with full vigor that subtraction is commutative. Even if you provide it with a counter example. (Which is conclusive proof in maths that a theory like “subtraction is commutative” is wrong.)

@bwyble @Linkletter I added a postscript to my blog post attempting to thank you for your critical comments and also give my summary of my thoughts in response.
https://ben-crowell.bitbucket.io/red_pen/
index

@ben_crowell_fullerton @Linkletter I read your piece and came away somewhat confused. You talk about how losing the ability to teach the fundamentals of physics because of Chegg is a big problem but at the same time don't seem to acknowledge the threat that ChatGPT poses for teaching of writing fundamentals. To me, it sounds as if you are alarmed about educational harms that occur in the core fundamentals of your area of expertise, but are not as concerned about those that occur in other areas.

For example, I would equate being confused about grams vs milligrams as being in the same ballpark of fundamentals as being confused about sentence structure and how to write a paragraph. And you also acknowledge that ChatGPT is quite good at giving students free answers about basic sentence structure.

How do you reconcile these two positions?

@bwyble @Linkletter Hi Brad - Thanks for your further comments!

"You talk about how losing the ability to teach the fundamentals of physics because of Chegg is a big problem but... don't ... acknowledge the threat that ChatGPT poses for... writing fundamentals."
Chegg isn't an AI system, it's a system for outsourcing homework to young people in poor countries. Because it uses humans, it's not limited to fundamentals. Its Indian workers generally don't make absurd blunders like ChatGPT does.
1/3

@ben_crowell_fullerton @Linkletter

Whether it's an AI is irrelevant. And the point is that ChatGPT does not make huge blunders in basic, writing. Which is where the concern about losing the fundamentals comes from.

@bwyble @Linkletter I did address ChatGPT's threat to writing fundamentals. I suggested that in classes of that type, there should be in-class writing to see what students can actually do without a machine.
2/3
@ben_crowell_fullerton @Linkletter This doesn't really solve the problem. Students get better at writing by practicing at home, not by writing a few examples in class to pass exams.
@bwyble @Linkletter "...I would equate being confused about grams vs milligrams [to] being confused about sentence structure..."
Yes, that's an example of remedial content that we usually cover in the first week of freshman physics, for students who had inadequate high school preparation. I used it as an example because it's so basic, not because it's typical of the level of a lower-division physics sequence. We cover heavy stuff like the second law of thermodynamics and Schrodinger's cat.
3/3
@ben_crowell_fullerton @Linkletter Whether the impacts occur in 11th, 12, freshman, etc.. doesn't really matter. If they can't learn to write well in earlier years, that impact will cascade forward.
@bwyble @Linkletter As an example, see Anna Mills' transcript "Transgender in Turkey."
https://docs.google.com/spreadsheets/d/1KbQIDPP2JIWu7JqXm7r7-zIcQ0PKzSEbDacT3Jaktog/edit#gid=1913381929
She had to push and shove it into addressing the topic, and fix logical mistakes such as conflating sexual orientation with being transgender. Even after all that coaxing, the result was still based on 100% fabricated sources. Maybe if you substituted real sources, it would be a B. But the skills needed to do all that are the skills that would make it pretty easy to just write a paper.
Sample AI-Generated Essays

Start Here Welcome! This list of sample academic prose generated by "AI" large language models was compiled by <a href="http://www.annarmills.com/">Anna Mills</a> for the <a href="https://wac.colostate.edu/">Writing Across the Curriculum Clearinghouse</a> as part of a collection: <a href="https:...

Google Docs
@Linkletter We need information provenance tracking, not AI detection tools. Information provenance would solve a ton of issues, including spam, disinformation, copyright, bias.
@Linkletter Those performance numbers seem barely better than an unmotivated dice roll. I mean, if the classifier just outputs different phrasings of "🤷" for more than three quarters of cases anyway, how is anyone supposed to use it in a meaningful way?

@Linkletter But they didn’t just stand there – they did something!

Consequences? What consequences?

@Linkletter @emenel whenever I wonder if this will be the case and hope it won’t, I remember how many people we are “comfortable” with being killed by cars in order to have the infrastructure for everyone who isn’t killed by cars
@Linkletter

Yes! Fake AI "lie detectors" are not the answer.

Essay writing is dead as a basis for exams and grade testing with this. Similar to when the electronic calculator was invented, academia needs to change.

@Linkletter I find this facinating. Because this means there might be some intrinsic style or markers in generated text that makes it uniquely “non-human”.

The fact that a tool like this can work even some of the time means language models like ChatGPT aren’t really there yet.

@weston @Linkletter I had a discussion with some folks who have used ChatGPT and are also regular users of Grammarly, which is essentially a purpose-built AI for editing (although it’s not advertised that way).

ChatGPT produces really amateurish prose that could really benefit from a good editor. It does not have the functionality of a tool like Grammarly built in. If it did it would probably be much harder for AI tools (or humans) to detect.

@MisuseCase @Linkletter so you are saying we should pipe the output of ChatGPT into Grammarly to potentially avoid detection.

Lol. Someone is definitely doing this.

@weston @Linkletter That won’t fix stuff like incorrect information, lack of citations, inability to construct an argument, etc. But it’ll fool the detection engine.

Folks are worried about ChatGPT replacing their jobs but I predict that “human AI bullshit detectors” will be a growth industry. Gonna be a real demand for English majors before long. XD

@MisuseCase @weston But how will they prove use of AI beyond a reasonable doubt? The stakes are too high to get it wrong on a hunch.

@Linkletter @weston Get the person who handed it in to explain their topic, their argument, why they chose this approach and not that approach, etc. You don’t even have to single people out for this, it can be part of the structure of the curriculum/an assignment. Everyone explains their approach to the work and their rationale for how they did the work.

If you had an AI whip it up for you, you won’t be able to do that because you put no thought into it. It’s not your product. 1/2

@Linkletter @weston The thrust of this is that we can’t rely on cheap technical fixes, rote approaches, or policing. We really need to rethink how we approach the structure of assignments, what our requirements are when we assign work to students, and indeed what we are teaching students to do and what they are learning to do. 2/2
@MisuseCase @weston Even still, how do you prove use of AI beyond a reasonable doubt? Everything you've suggested is solid, but will still never prove use of an AI writing tool.
@Linkletter @weston I think if someone can explain, justify, and defend their work product the way I described before, then it doesn’t matter whether they used an AI or not, because they’re learning what they’re supposed to learn.
@MisuseCase @weston So maybe instead of "human AI bullshit detectors" we should hire teachers and librarians, eh?
@Linkletter @weston Or just “bullshit detectors” but without specifying AI.
@MisuseCase @Linkletter @weston so re-introduce the viva voce into the marking criteria?
@Dasy2k1 @Linkletter @weston Sure, because there’s no way to cook it up on the spot with an AI (yet).

@weston @Linkletter Interesting. I took this the other way around.

Tools like #chatGPT were trained to output hunan-sounding text. That the classifiers to detect their output are both: needed, and provide so many false positives, implies to me that the generators are doing a pretty good job at sounding human.

Otherwise it wouldn't be a problem for teachers, and artificial content would be easily spotted.

Basically means we all sound machine-like already.

@pseudonym @weston @Linkletter I think the OpenAI folks did not put a lot of work into their detection tool.

Also humans with a certain skill set can probably detect AI-produced text with greater accuracy than the detection tool, or any detection tool we are capable of making currently - provided that the text is a certain length. But doing so will require close reading.

@MisuseCase @pseudonym @Linkletter
TBH I’m getting the same vibes from OpenAI as I got from Waymo like 8 years ago, a strong breakthrough doesn’t mean it’s ready for general adoption.

@MisuseCase @weston @Linkletter Very much this.

I've gotten some value out of #chatGPT so far, so for my use cases, it's reasonably production ready.

The problem is, it isn't general AI, nor is it a good search engine or fact finder.

It produces mostly good first drafts of an idea for writing, which I can then edit and build on. It's an assistive technology. Not sufficient in itself.

It's a great bull shit generator though.

@weston @Linkletter Not really, you can detect the style of specific human authors too if you have enough text samples.

@Linkletter Some folks who have played around with making edits to AI-generated text and running it through the detection tool find that it’s pretty easy to fool also.

For example, if you prompt ChatGPT to generate something for you in the wrong tense and the wrong tone and then use a service like Grammarly to “fix” it into the correct tense and tone, the detection tool will say it’s “unclear” whether the output is AI-generated. 1/2

@Linkletter A few things to keep in mind here:

- ChatGPT is actually not terribly good at composition or editing
- Grammarly is purpose-built for, if not composition, editing
- Decent editing of raw ChatGPT text output by a human with decent writing skills, with or without a tool like Grammarly, will often result in pretty significant changes to the text output

But it seems like massaging the output of one AI with another AI will fool the detection engine. 2/2

@Linkletter It works terribly. It always shows up as ambiguous in my hands.