“@royperlis This would be exempt. The model was used to suggest responses for help providers, who could opt in to use it or not. We didn’t use any PII, all anonymous data, no plan to publish. But MGH's IRB is formidable... Couldn't even use red ink in our study flyers if i recall...”
@cfiesler The flip side of deceptively using AI for mental health treatment is deceptively using chat logs of mental health treatments to train enterprise customer service AI.
"Suicide hotline shares data with for-profit spinoff, raising ethical questions"
"The Crisis Text Line’s AI-driven chat service has gathered troves of data from its conversations with people suffering life’s toughest situations."
https://www.politico.com/news/2022/01/28/suicide-hotline-silicon-valley-privacy-debates-00002617
@cfiesler @natematias specifically, my argument is it's not possible to do this on Twitter **now**… even a great explanation wouldn't spread, nobody's looking to RT it. much more likely is people would be super primed to argue with it even if it's kind of reasonable. and all the while the old stuff keeps spreading without knowledge of the new stuff.
but, yeah, twitter is double-edged like that. great way to get your message out, but watch out!
@emmatonkin @cfiesler Duolingo has always been an A/B testing hellscape. I don’t think it would be unfair to call it the service’s core principle, to try to discover the most effective way to teach languages by treating the user base as a large test group. It has always been contentious. For one thing, there’s a built in disregard for disability, which didn’t seem to conflict with the values of the organization enough to do more about it than acknowledge that it’s unfortunate. ..Thanks. You too.
I don’t think there’s any malice in this. I think it’s a result of tunnel vision driven by idealism, to make a free learning service that works better than the often very expensive existing ones, using the kinds of hooks a video game might have to help people stay engaged. It’s a noble goal, and I do think it probably has made a positive contribution to the state of the specific category of language learning products to which it belongs.
But their simplistic “data speaks loudest” approach to deciding how best to teach human communication of all things is entirely absurd. They cannot measure learning. They hope that it’s a necessary byproduct of what they do measure, but their absolute focus on numbers makes learning that doesn’t create better numbers undesirable. They actively discourage learners from taking their time with lessons in favor of advancing further at a faster pace. I strongly suspect that they have managed to transform the pop up tips from a helpful option that allows some needed flexibility into a way to ensure that people will continue to advance well past the limits of the vocabulary and grammar they actually know. It’s not that it can’t or doesn’t teach anything at all, but it does profoundly undermine itself.
Ethical questions aside, the anger and frustration has been there the whole time. I think everyone wants it to work and to be good. My guess is that the amount and depth of the dismay is probably not just about this change, or even just the test group thing, so much as it’s about a long history of disregarding user feedback in favor of data while not actually being able to deliver on the promise of more effective language learning. The belief that you can disregard disabled people in order to serve the average person better is just one symptom of a much larger misunderstanding of people and learning, and I think it really shows in how it feels to use Duolingo. It’s not surprising that users would be mad about another disruptive change that doesn’t address the most serious problem at all.
@cfiesler I find that we’ve normalized “just A/B testing” too quickly.
For example, when the App Store was introduced, you deliberately bought/installed something, in part based on its description. Then you deliberately installed updates, based on their release notes.
This agency has since been taken away: release notes are just an endless repetition of “we update xyz in order to make the app better”, and the actual app may or may not change without your consent or control.
@cfiesler there’s a difference between A/B testing and clinical trials. Toying with peoples mental health issues using AI is both clinically unethical and technologically unethical.
Also, Facebook is low bar.
@cfiesler I believe the author later clarified that the people were not directly chatting with the model. It was used more as a tool to help peers craft their responses.
While having a human in the loop does mitigate some of the PII issues, the lack of informed consent is still stands.
@emmatonkin @cfiesler I think the demo video showed that the operators had the option of directly forwarding the responses to the model. I'm assuming (hoping) the humans acted as filters for personal stuff.
What's worse is that stuff like this just sets precedent for even more outrageous applications of LLMs.
@rajatsahay @cfiesler
Ack, though. A) I wonder what guidance, training, eval they were given because that's quite some task to carry out in a hurry and B) hang on, is the LLM responding with no context other than the last message received, then? More usual to give it context for a (seemingly) relevant answer.
Totally agreed re precedent. Not only does it need careful regulation, but I suspect this is already in breach of existing regs.
@emmatonkin @cfiesler your response perfectly highlights a huge problem with AI hype. Most companies cite human moderation to deploy borderline illegal services, claiming their "AI model" gives unreal performance- and are within the letter of the law.
When the model inevitably fails, any blame for the misaligned decisions is put directly on the same moderators- who usually receive little to no training on how to handle these situations.
@cfiesler
"We didn’t use any PII, all anonymous data"
I would love to see the method they used to ensure that the data are anonymised. Unless, shockingly, it turns out to be "Assert on the basis of total convenience and no analysis whatsoever that the data are anonynised and do as you wish from that point".