Mastodawn

Dr. Casey Fiesler Jan 7, 2023

The co-founder of Koko (non-profit that offers peer mental health support) has a Twitter thread (https://twitter.com/RobertRMorris/status/1611450197707464706) about an experiment where they fed requests for help to GPT-3 and help providers could send those AI-generated support messages rather than their own. They found that AI responses were rated higher but also "once people learned the messages were co-created by a machine, it didn’t work." But there have been some interesting questions about the ethics... 🧵 #gpt3

Rob Morris on Twitter

“We provided mental health support to about 4,000 people — using GPT-3. Here’s what happened 👇”

Twitter

Show thread

Dr. Casey Fiesler Jan 7, 2023

I'm a little confused by this response about informed consent (https://twitter.com/RobertRMorris/status/1611582827224797185) but I think it illustrates a significant problem among some researchers with conflating "research ethics" with "would an IRB allow me to do it" which is potentially really harmful. I would hope that the reason to seek informed consent isn't because a regulatory body forces you to, but because it is the right and ethical thing to do. (2/n)

Rob Morris on Twitter

“@royperlis This would be exempt. The model was used to suggest responses for help providers, who could opt in to use it or not. We didn’t use any PII, all anonymous data, no plan to publish. But MGH's IRB is formidable... Couldn't even use red ink in our study flyers if i recall...”

Twitter

Show thread

Dr. Casey Fiesler Jan 7, 2023

But regardless, based on the thread I think that though the help providers were aware of the AI (since they were choosing to use it), it seems that the people seeking help were not aware. Though based on the "once people learned" finding, at least some of them must have been debriefed? Were they essentially following typical protocol for a deception experiment? (Though if that were the case I would have expected that as an answer re: consent rather than "we didn't have to".) (3/n)

Show thread

Dr. Casey Fiesler Jan 7, 2023

The Twitter thread emphasizes that they weren't using PII, but prompts from people seeking mental health support are still potentially quite sensitive, and some folks on Twitter were concerned about data going back to OpenAI - I assume that GPT-3 can run internally though? In which case I suppose the privacy risks would be the same as when people choose to use the system at all. (4/n)

Show thread

Dr. Casey Fiesler Jan 7, 2023

But I think that even outside of privacy concerns, a lot of people just don't like the idea of such sensitive content potentially being used to train AI without their consent, which is something that we should know from the backlash against Crisis Text Line. (5/n)

Show thread

Dr. Casey Fiesler Jan 7, 2023

In fact, a lot of people are upset about being "experimented on" without their consent regardless of the context. Even though this is sometimes framed as "it's just A/B testing!" when it happens on a platform/product, sensitive contexts (e.g. mental health, emotion) are a special case. (We actually found this when studying reactions to the Facebook emotional contagion study: https://cmci.colorado.edu/~cafi5706/UnexpectedExpectationsNMSPreprint.pdf ) (6/n)

Show thread

emmatonkin Jan 7, 2023

@cfiesler
Forgive me for a third comment in quick succession :) but I do also think that "it's just A/B testing" is something that has a tricky relationship with consent and trust. Have been watching the recent Duolingo UX debacle and am impressed by the level of anger and confusion expressed by users who seemingly randomly received radically different UIs and views on their learning. Perhaps some companies are too ready to experiment without explicit consent for participation.

Show thread

Sean Lake: famous ballet

@emmatonkin @cfiesler Duolingo has always been an A/B testing hellscape. I don’t think it would be unfair to call it the service’s core principle, to try to discover the most effective way to teach languages by treating the user base as a large test group. It has always been contentious. For one thing, there’s a built in disregard for disability, which didn’t seem to conflict with the values of the organization enough to do more about it than acknowledge that it’s unfortunate. ..Thanks. You too.

I don’t think there’s any malice in this. I think it’s a result of tunnel vision driven by idealism, to make a free learning service that works better than the often very expensive existing ones, using the kinds of hooks a video game might have to help people stay engaged. It’s a noble goal, and I do think it probably has made a positive contribution to the state of the specific category of language learning products to which it belongs.

But their simplistic “data speaks loudest” approach to deciding how best to teach human communication of all things is entirely absurd. They cannot measure learning. They hope that it’s a necessary byproduct of what they do measure, but their absolute focus on numbers makes learning that doesn’t create better numbers undesirable. They actively discourage learners from taking their time with lessons in favor of advancing further at a faster pace. I strongly suspect that they have managed to transform the pop up tips from a helpful option that allows some needed flexibility into a way to ensure that people will continue to advance well past the limits of the vocabulary and grammar they actually know. It’s not that it can’t or doesn’t teach anything at all, but it does profoundly undermine itself.

Ethical questions aside, the anger and frustration has been there the whole time. I think everyone wants it to work and to be good. My guess is that the amount and depth of the dismay is probably not just about this change, or even just the test group thing, so much as it’s about a long history of disregarding user feedback in favor of data while not actually being able to deliver on the promise of more effective language learning. The belief that you can disregard disabled people in order to serve the average person better is just one symptom of a much larger misunderstanding of people and learning, and I think it really shows in how it feels to use Duolingo. It’s not surprising that users would be mad about another disruptive change that doesn’t address the most serious problem at all.

Show thread

emmatonkin Jan 8, 2023

@robotrecall @cfiesler
I think perhaps the core principle of that particular service has changed over time. Many people probs recall the brand's history as a (formerly) community-led attempt at building a) the most effective way to study languages and b) a large dataset through which to do so, but, well, things change, and so does consent. Participating in a community project is very different to purchasing a service that *still treats you as an uninformed test subject*.

Show thread

emmatonkin Jan 8, 2023

@robotrecall @cfiesler
If I am donating £ for participation in a community project i believe in and it also wants me to try some new features to help others, I might - could be fun. If I am paying for an IPO'd commercial service and they then decide to get all GladOS on me w/o consent or withold parts of the training material *for which I pay*, they cannot expect the same tolerance or approval of their new shareholder-pleasing mission.

Show thread

emmatonkin Jan 8, 2023

@robotrecall @cfiesler
Same goes wrt disability and accessibility. Community led project that is still learning and trying to do better? Perhaps it's unfair, but we would probably cut them a lot more slack than a commercial service that fails in these ways, because they are perceived as a work in progress. A commercial service with no community mission gets no such tolerance - we (probably fairly) assume they are motivated by the almighty $.