@davidgerard , not sure if you saw this great post about ML models in radiology?

The boosters love to say ‘look how good AI is in medical diagnosis, therefore LLMs are good’. Only, it turns out (from the article):

while models beat humans on benchmarks, the standardized tests designed to measure AI performance, they struggle to replicate this performance in hospital conditions. Most tools can only diagnose abnormalities that are common in training data, and models often don’t work as well outside of their test conditions

It also highlights a problem that’s actually quite general in medicine: we have far more data about unhealthy people than healthy ones. I was talking to a cardiologist almost ten years ago who was very excited about the data things like the Apple Watch could collect. Apparently they know that a lot of people who have heart attack have arrhythmia, but they have no idea if this is a meaningful correlation. Healthy people tend to have their heart monitored for a minute or less on a visit to a doctor every few years. People with known heart problems wear heart monitors that can record a load of things, so you have very good data on their heart rhythms but no baseline to compare it against.

This is also true for radiology. You really want to do anomaly detection: take a few million scans of healthy people, wait a few years to see if any of them have undiagnosed conditions, and then use that dataset to train a model of what a healthy lung (or whatever) looks like. Then feed new scans to the model, have it flag anomalies, and loop in an expert to figure out what kind of anomaly it is and whether it’s important.

But what you have is a load of very examples of things that are wrong, in very specific ways. And these also have artefacts that are specific to individual devices, so it’s easy for a model to learn that people who are scanned with this class of machine have this condition.

And that’s just the start of the issues they discuss.

AI isn't replacing radiologists

Radiology combines digital images, clear benchmarks, and repeatable tasks. But demand for human radiologists is ay an all-time high.

The Works in Progress Newsletter
@david_chisnall @davidgerard
You don't want to learn how "normal" ranges of values for blood tests are determined. (I found out when I was misdiagnosed as "normal" but I was actually very not normal)
@david_chisnall @davidgerard "Normal" ranges for blood tests are typically determined by taking 2 standard deviations of all results ... so if your value is "normal", that means 95+% of the population (whether sick or healthy) have a similar value. And it turns out that some tests aren't very accurate, as I found out the hard way - fortunately one doctor ordered a different test that showed a severe vitamin B12 deficiency, at which point I went to the medical library and found many unpleasant surprises in the research literature.
@david_chisnall @davidgerard I don't know where I read that, but in one case they actually ended up accidentally training an image contrast recognizer: the contrast for the positive and negative outcomes in the training set were very different. So of course it fell apart when used in the real world.
@edwintorok @david_chisnall @davidgerard there was another where they trained a ruler-recogniser; "suspicious-looking" moles were more likely to have a ruler in the photo (because it was a photo taken to document the mole) whereas normal healthy mole photos were just taken casually with nothing else in the photo. https://www.sciencedirect.com/science/article/pii/S0022202X18322930
@david_chisnall @davidgerard Do we have a word for the inverse of survivorship bias? Berkson’s fallacy?
@david_chisnall @davidgerard [abstract, non professional, long-term thoughs]
I agree with your points and it's in theory better to have baselines and standards.
I just don't know how people/professionals can or should handle it.
I like to think of work?value in abstraction layers. LLM's adds an other layer, like radiology. Then maybe long-term baseline from gadgets is an other. And now we have a team and a team coordination adds an other layer.
I don't see it manageable for the minimum doctor.
@david_chisnall @davidgerard this was already to long and I did not bring my point clearly enough. So I'm happy to reply, but will not write an ego monolog :)
@david_chisnall @davidgerard Reality behind the hype. A lot of investors in AI are going to lose a lot of money given so nany mediocre outcomes.
@david_chisnall Fascinating article, thank you for sharing it!
@david_chisnall @davidgerard This would have a lot to do with it being basically predictive text and not intelligence. 

@david_chisnall @davidgerard

This is a fantastic, easy to understand explanation. Thank you

@david_chisnall @davidgerard
brain scan are morphed into one standard to compare them and make research.

Until recently, the standard brain had been the scan of a unique person. It took so long for people to even realise that it was wrong ...

@david_chisnall @davidgerard
One of the frustrations with the ML boom is that 'Data Scientists' seemingly have little understanding of basic statistical concepts.

Machine Learning is just computational statistics. The same laws about sampling, bias, reliability, etc apply. But very few data scientists seem to have much understanding of them (judging by the work I've seen).

@cian @david_chisnall @davidgerard
Rather than a lack of understanding, I suspect it is more the result of competitive pressures - i.e. people who publish flawed results will get ahead of people who do not.

There could be 90% of data scientists who understand statistics, the remaining 10% will be overwhelmingly represented in the set of “successful” researchers.

@jhominal @cian @david_chisnall the other problem is that the discussion happens in preprints and blog posts - and a lot of that is trash and marketing.

@cian @david_chisnall @davidgerard

IME, most(*) of the people working the coal face understand the limitations of what they've built - The information never seems to make it to the public or the sales people.

(* - CS in particular seems to have an unnaturally high percentage of people who refuse to admit they're ever wrong - even when confronted with evidence. Adjust accordingly.)

@cian my impression of “Data scientists” has so far been that they have as much in common with science as flat earthers.

@david_chisnall @davidgerard

@lffontenelle A parte do viés dos dados com super-representação de pessoas afetadas (e geralmente branca e com melhores condições socioeconômicas), porém, atrapalha qq modelagem - com ou sem ML.