So, in addition to being biased against neurodivergent writers, "GPT detectors are biased against non-native English writers" https://arxiv.org/abs/2304.02819v2.

Cool. Cool cool cool cool. Tight tight tight. Cool. 😑​

GPT detectors are biased against non-native English writers

The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse. The published version of this study can be accessed at: www.cell.com/patterns/fulltext/S2666-3899(23)00130-7

arXiv.org
@Wolven @FractalEcho
OH WOW that's striking. False positive rates of 48-76% ??
Nobody should be using GPT detectors for anything important.
@janellecshane @FractalEcho Not even a little bit at all, no
@Wolven After I posted this on Tumblr last night I got so many comments from people saying their schools are using these detectors, or are requiring TAs to use them. Really enraging.
@janellecshane @Wolven a;dslfjha;dsfjhas;dfjags;dfdkjgda;sk
@janellecshane I've been talking to literally every university administrator i can to get tfem to understand why this is a terrible path to take; no definitive results thus far 😕
@janellecshane @Wolven @FractalEcho Yeah, that fits with some testing I did a few months ago. I would paste in a passage from something in English literature (e.g., Shakespeare). What was weird was that GPT say it was AI-generated, but in the same response identify the source correctly.

@mikeloukides @janellecshane @Wolven @FractalEcho this is actually a really tricky challenge for our current generation/classification models.

LLM generate new text which is similar to the data they are generated from, by their very nature.

Generating another model to try to distinguish natural human text from LLM output is going to learn the same things as the LLM about what makes stuff "more like the input", so I don't see any way that it could come up with conclusive results.

@kepstin @mikeloukides @janellecshane @Wolven @FractalEcho Yeah, if there is *one thing at all* that GPT & friends are good at it, is writing text that looks like normal text. So I'd be shocked if a detector could be written.
@varx @kepstin @janellecshane @Wolven @FractalEcho@kolektiva.soc Yeah, I'm skeptical about detectors, too. At best it will be an increasingly difficult game of whack-a-mole.

@janellecshane

So not even the average 50%???

Hahaha, actually completely wrong then with bias too….

@janellecshane @Wolven @FractalEcho Given that all AI detection tools are themselves powered by AI, I'm totally confused now. Yes, don't use them and don't use AI either. If they haven't already, they'll evolve to include "quirks", a little entropy, to add realism to what they generate, making them harder to detect. It's unfair for honest students in the short term, but at the end of the day they'll be the ones who can hold down jobs

How many times, in how many ways, do i have to say "trying to come up with dispositive definitions or tests for humanness which include everything you want to include and exclude everything you want to exclude is both so difficult as to be functionally impossible, also fundamentally supremacist"?

Belief in the dispositive efficacy of turing style tests is a category error. "Proving consciousness" or "humanness" is not what turing intended his test for, and the fact that they've been consistently misunderstood that way does real harm to real people alive today.

Have a good one

You can essentially watermark "A.I." generated text via steganographic rules about how the ststem should choose words and form sentences. It would make for some odd constraints and sentence structures, but it could be done, and in such a way the the undoing if it would require either a) learning enough about the topic in question to adequately rewrite the text or, b) make the undoing itself obvious.

But even then, that doesn't get at the underlying problems of motivations and values which drive people to either a) cheat/plagiarize, b) set up such antagonistic pedagogical frames that they need a dispositive "gotcha" ready at all times, or c) fail to recognize the (1) the variously socially constructed disciplary and normative linguistic requirements which might cause someone to write in a particular way and (2) the harm done by the rigid enforcement of said same norms without particular care to the needs and circumstances of the individual in front of them.

And THAT is the point, here.

Once again: you can't technofix your way out of sociocultural problems.

I'm getting offline for a while.

@Wolven Preach. The point of the Turing Test, as I see it, was to raise questions about the utility of separating “human” from “non-human”, and to challenge our ideas of empathy. Do Androids Dream of Electric Sheep? was the best take on the test I’ve seen.

Using it prescriptively to ensure people are sufficiently human is some next-level bullshit.

@Wolven can you recommend further reading on this topic?
@Wolven The thing with Turing tests, which nowadays are mostly "prove you are not a bot", is that they can go both ways. Their bots want to devour our content to monetize it.
Ʈõ đó τᏂᾆƭ, ʈɧἑɏ ӎűƨʈ bέ ᾅblἑ ʈõ ŕĕãď ɪʈ.
@Wolven I hope people are not just taking the reports at face value. Even with plagiarism detectors, the report is the beginning of the story, not the end.
@theLastTheorist I think you've underestimated how scared and path-of-least-resistance-seeking much of the secondary and post-secondary workforce has become in the even just the last year or so :\

@Wolven LOL I'm leaving after the summer term and my chair took *everything* from my class to hand to another adjunct. Least resistance for sure. I'm not a softy but I won't discipline anyone based on a mystery bot.

I used to joke that in 10 years there will just be a kiosk where students put in their Amex and out comes a diploma. 3 years to go . . .

@Wolven fighting context-unaware, easily-biased algorithms with more of the same algorithms; what could have gone so wrong? /s

@Wolven Thanks, I loathe it. 👍🏼👍🏼

Do you happen to have a link handy for another study about the bias against neurodivergent writers? I'm trying to keep multiple of these types of cautionary studies at the ready to share with folks on my campus / in my communities.

Thanks again for sharing this link, in any case!

@Wolven wow computers that are trained on the internet are racist again! Who could've predicted it.
@Wolven I started noticing this in my own mind the other day. I don't want to read things written by LLMs, but I do want to read things that are imperfect. How I decide what to read is racist (and discriminatory in other ways) and resisting that tendency is getting *harder*. It sucks.
@Wolven who could have predicted? It doesn't seem like they looked at AAVE but I'll bet dollars to donuts there are similar biases.
@Wolven I’d rather read a poorly written original thought then an expertly written average one.

@Wolven this!

When I was at Mozilla, we were able to do a decent job of using ML to classify which team bugs reported by users in the field should go to, but one of the things that worried me about that approach was the assumption that bugs were written by people whose first language is English.

@Wolven and this is one of the few things I have a level of comfort with using ML for (along with 'hey, will this patch cause a regression' and similar) because it's a relatively contained problem (if x isn't x'ing then y team should look at it.)

@Wolven My presentation predated the explosion of ChatGPT and its ilk by several months, but it looks like these “detectors” are like all of the other “anti-cheating” systems that have been foisted on students. I.e., they should (but probably won’t) be buried in the sand like so many E.T. cartridges.

https://youtu.be/Uk-0wZy-9L4

Online Proctoring: In Search of an Ethical Balance Between Integrity and Privacy

YouTube
@Wolven When I was informed about the bias against non-native writers my intuition was that it would extend to neurodivergent writers as well — so there is evidence that this is the case?

@Wolven I am so happy (for my own sake, but sad for those who have to go through this) that I did most of my work for school before all of this. What I hear from friends who just graduated their second year of HS, all teachers seem to have pivoted to doing all assignments in-classroom on locked-down software to combat GPT-generated assignments.

I know this would be horrible for me, since I really need a lot of time to do assignments, and to do them in peace. For the same amount of time, I have gotten more quality of writing on my own terms than the previously described methods, the times that I have had to write assignments that way.

@Wolven Where can I find out about the bias against neurodivergent writers?