RE: https://mastodon.nz/@leighelse/116149727745113480

I worked on a large scale project testing medical transcription. (Maybe one of the largest ones) Hundreds of doctors reviewed the output and called out the issues.

It was not, and still is not, ready. Public health teams that roll this out without red teaming and remediation and feedback and a way to influence the weights of models are irresponsible.

In fact, I am willing to offer up to five hours of my time — free — to any public sector team or nonprofit (with annual operating costs below USD 2M) anywhere in the world that needs help figuring out what makes sense and how to respond to top down pressure telling you to implement AI.

And if they’ve already chosen something for you, I am willing to help you figure out how to sand down the risks.

email me: adrianna (at) futureethics.ai

Edit: for public servants who technically can’t get ‘free’ things from a vendor, consider this one on one coaching / advice or a pre-sales call

@skinnylatte My company, which is in public sector works (as a software firm), has a VP of AI (for usage and adoption). For me as a hacker/coder/engineer, I get it, even if adoption feels weird to me. (because Ive coded shit without AI my whole life and 30 year long career).
@skinnylatte Woah, you're a hero!

@skinnylatte #ai reminds me of , what if we gave a war and nobody came?

What if no one wants their crap? Not hearing anyone saying how it is worth the waste of water, energy, and jobs? and everytime it is forced on me there is a disclaimer 'warning do not trust what I tell you'?!?

wtaf??

@skinnylatte
Hi Adrianna, would this org qualify?

Www.bluefaery.org

@skinnylatte Are there white papers or things you'd recommend? For profit caveat. :/

I regularly talk to someone in veterinary med that's dealing with top down AI push and other doctors creating inaccurate notes from AI. The incentives there are all out of whack since the majority of doctors don't do notes at all. And then extend that to AI for blood work and cytology.

@NegativeK yep, my lab is working on some white papers and webinars in this space. Will share some
@skinnylatte
Does it involve punching the AI bros in the face? I'm not judging, if so 🤗

@skinnylatte

Soo... using that tool, a doctor could see one more patient per shift.

I have this idea. Sounds a bit like science fiction, so bear with me.

What if... we simply recorded the audio, then paid a person to listen to that audio and type up what they heard?

@wakame @skinnylatte

*In general* AI transcription tools are now comparable to or better than human transcription in raw accuracy. Humans aren't very good at sustained effort like this, but may be better at understanding nuance.
Much depends on the system and context; the original article notes that it's dealing with a New Zealand accent, which I'd expect to be currently underrepresented.
A hybrid model is likely to be optimal for precision.

https://www.healos.ai/blog/the-truth-about-ai-medical-scribe-accuracy-rates-2025-healos-achieves-98-performance

AI Medical Scribe Accuracy Rates 2025: HealOS Guide

Discover AI medical scribe accuracy rates and how HealOS outperforms competitors. Compare error rates, performance metrics, and implementation benefits.

@mmalc @wakame I do independent third party testing and evaluations of these types of claims. other than accents, there are aspects in which this type of tool can harm patients and treatment plans without a plan to extensively and continuously test them.

@skinnylatte @wakame
I’m not arguing against that; I'm arguing against the assertion that “just” using humans would be better.

I started in speech recognition research in the late 1980s (and Noel Sharkey was a later colleague). I'm well aware of the issues. Using any basically “safety critical" system without adequate checks is foolhardy.

@mmalc you would know, then, that safe, accurate medical transcription has been about the same distance into the future as *safe* full self driving, but for much longer.

Both require hyper-vigilant supervision, for the same reason - anything missed by either can easily be fatal.

Voice recognition is not getting better, now that it is being called #AI, but it probably is being trusted by decision makers more than it should be.

@skinnylatte @wakame

@wakame
How about paying for sufficient doctors on each shift so they aren't too stressed to create their own documentation? And the support staff so they don't have to do non-medical.admin?
@skinnylatte

@skinnylatte

We were rolling it out in various flavours in the UK NHS just before I left and it was pretty awful.
My only direct experiences have been pretty unimpressive and riddled with small errors that would have affected the patient outcome.
Recent stories I have heard from colleagues have included medication being changed or cancelled, which would then require another appointment to re correct.
Short version- taking the human out and relying on the machine is dangerous.

@skinnylatte @aral My doctor uses it. My primary. She thankfully has struck me as someone who isn’t offloading responsibility to it or using it for other than review.

I’m well aware of that kind of tool, inaccurate at best, relied on by people with less ability than her though. :/

@josh
I have seen a take that it can dull a medical practitioners diagnosing and analytical skills. Potential explanation is that the summarising for notes is part of those processes. Maybe by refusing I am helping keep my GPs skills fresh? The emergency vet who used AI note taking didn't do a great job diagnosing my dog's woes, but was also working with the handicap of not being able to access the dog's regular notes and having to take the owner's word for things.
@skinnylatte @aral
@RedRobyn @josh @aral overreliance on these tools is also something that we test in this type of evals, and yes, we see that a lot in clinical ai
@skinnylatte It's incredible that any organisation in the medical field should be so trusting. If it was a new physical device it would go through enormous amounts of testing before it ever got near a patient. Why should software be different?