But Claude said tumor!
But Claude said tumor!
I had to prepare a high level report to a senior manager last week regarding a project my team was working on.
We had to make 5 professional recommendations off of data we reported.
We gave the 5 recommendations with lots of evidence and references to why we came to that decision.
The top question we got was: “What are ChatGPT’s recommendations?”
Back to the drawing board this week because LLMs are more credible than teams of professionals with years of experience and bachelor-masters level education on the subject matter.
you fool
"these are chatgpt's recommendations we just provided research to back them up and verify the ai's work"
That is the least worst implementation!
I knew one HR person who cared about employees and did her best to help out. She only lasted 6 months.
You, and we, are better off for it.
The issue is that it’s been forgot (Remember the 5th of November)
There are some very impressive AI/ML technologies that are already in use as part of existing medical software systems (think: a model that highlights suspicious areas on an MRI, or even suggests differential diagnoses). Further, other models have been built and demonstrated to perform extremely well on sample datasets.
Funnily enough, those systems aren’t using language models 🙄
(There is Google’s Med-PaLM, but I suspect it wasn’t very useful in practice, which is why we haven’t heard anything since the original announcement.)
I have read some headline
Really.
I don’t think technical knowledge gives as good a sense as a lot of experience working with one.
Like saying the guys who designed a particular car would know best how it’ll perform on various racetracks. My sense is a driver would have a better sense.
Eh. Depends on which tech is being used and how. For a lot of things, relatively basic ML models purposefully trained do a pretty good job, and are, in fact, limited by the diagnoses in the training data. But more generalized “AI” tools seem rather… questionable.
Like, you can train a SVM on fMRIs to compare structures in the brain between patients diagnosed with bipolar disorder and those that are not diagnosed with it, and it will have an accuracy rate on new patients basically equal to the accuracy rate of the doctors who did the diagnosing in the training set. But you’ll have a much harder time creating a model that takes in fMRIs and reports back answers to the question of “which brain disease or abnormality do I have?”
This stuff works much closer to advertised when it’s narrowly defined and purpose built, but the people making and funding this work want catch-all doctor replacements, because of course they do, because there’s way more money in charging hospitals and patience 10% less than a doctor’s salary than there is in providing tools that make doctors’ efforts in diagnosing specific illnesses easier.
Or, at least there is if you can pull it off.
Precisely. Many of the narrowly scoped solutions work really well, too (for what they’re advertised for).
As of today though, they’re nowhere near reliable enough to replace doctors, and any breakthrough on that front is very unlikely to be a language model IMO.
Peak intelligence, is realizing an LLM doesn’t care whether its tokens represent chunks of text, sound, images, videos, 3D models, paths, hand movements, floor planning, emojis, etc.
The keyword is: “multimodal”.
As for being able to correctly correlate some “chunks of MRI scan” with the word “tumor”… that’s all about the training (which I’d bet Claude is missing… did I hear “investment opportunity”? Guy isn’t wrong).
I know of at least one other case in my social network where GPT-4 identified a gas bubble in someone’s large bowel as “likely to be an aggressive malignancy.” Leading to said person fully expecting they’d be dead by July, when in fact they were perfectly healthy.
These things are not ready for primetime, and certainly not capable of doing the stuff that most people think they are.
The misinformation is causing real harm.
Exactly. So the organisations creating and serving these models need to be clearer about the fact that they’re not general purpose intelligence, and are in fact contextual language generators.
I’ve seen demos of the models used as actual diagnostic aids, and they’re not LLMs (plus require a doctor to verify the result).
I need help finding a source, cuz there are so many fluff articles about medical AI out there…
I recall that one of the medical AIs that the cancer VC gremlins have been hyping turned out to have horribly biased training data. They had scans of cancer vs. not-cancer, but they were from completely different models of scanners. So instead of being calibrated to identify cancer, it became calibrated to identify what model of scanner took the scan.
I am failing to find source, but there is also a story about an older predictive model that worked great at one hospital, but failed miserably at the next. There was just enough variation in everything that the model broke.
(I think the New England Journal of Medicine podcast, but I am not finding the episode.)
Never attribute to malevolence that which can be explained by incompetence.
Including the end of humanity at the hands of the robots apparently
That reminds me of a fairly recent article about research around visualisation systems to aid with interpretable or explainable AI systems (XAI). The idea was that if we can make AI systems that explain their reasonings, then they can be a useful tool, especially in the hands of domain experts.
Turns out that actually, the fancy visualisations that made it easier to understand how the model had come to a conclusion actually made subject matter experts less accurate in catching errors. This surprised researchers and when they later tried to make sense of it, they realised that they had inadvertently dialled up people’s likelihood to trust the model because it looked legit.
One of my favourite aphorisms is “all models are wrong, some are useful.” Seems that the tricky part is figuring out how wrong and how useful.
It’s worse now than ever though, many managers have been steeped in tech optimism their whole working careers. The failures of “revolutionary new systems” have been forgotten about while the success of other things are lauded.
They’ve been primed to jump on any new “innovation” and at the same time B2B marketing has started adopting some of the most manipulative practices that used to be only used on consumers. They’ve crafted a narrative that shapes discourse so the main objections that appear are irrelevant to the actual issues managers might run in to.
Stuff like “but what if it is TOO good?!” and “what if the wrong people get their hands on this AMAZINGLY POWERFUL new tech?!”
Instead of “but does this actually understand anything or is it just giving output that looks correct?” or “ Wait, so, how was this training data obtained? Will there be legal issues from deliverables made with this?”