Machine learning can easily produce false positives when the test set is wrongly used. Just et al in
@NatureHumBehav suggested that ML can identify suicidal ideation extremely well from fMRI and we were skeptical. Today retraction and our analysis of what went wrong came out.

Here is the retracted paper: https://nature.com/articles/s41562-017-0234-y and here is our refutation https://nature.com/articles/s41562-023-01560-6. If true, the paper's approach could revolutionize psychiatric approaches to suicide.

So what went wrong? The authors apparently used the test data to select features. Obvious mistake. A reminder for everyone into ML: never use the test set for *anything* but testing. Only practical way to do so in medicine? Lock away the test set till algorithm is registered.

Side note: it took 3 years to go through the process of demonstrating that the paper went wrong. Journals need procedures to accelerate this. Also, all the good things of this were by
@tdverstynen

@kordinglab This is a very interesting topic. Curious question, in these 3 years, what took more time? Replicating the original author's result, demonstrating it was biased, or communicating with the journal and original authors of the paper...? Did you have easy access to the original research data and code?

@vbuendiar @kordinglab Honestly, what took the most time was the journal process itself. The re-analysis took about a week of time.

As for data access, that was tricky. The authors didn’t release the raw data set, nor did they release the full pipeline. They released Matlab files of voxelwise responses from a subset of regions. Basically the final stage of analysis. You can check out the data and code we were given (as well as my re-analysis) here: https://github.com/CoAxLab/Reappraisal-of-Neuromarkers-of-Suicide

GitHub - CoAxLab/Reappraisal-of-Neuromarkers-of-Suicide

Contribute to CoAxLab/Reappraisal-of-Neuromarkers-of-Suicide development by creating an account on GitHub.

GitHub
@kordinglab @tdverstynen
Here's what I tweeted about this back in 2021
@deevybee @kordinglab I fully agree with this concern. Although I will say that I know the imaging center where the replication study is being done, has put in a lot of safeguards for handling potential triggering from the study, in coordination with the lead psychiatrist on the study. However, this concern does significantly limit the potential clinical utility of this approach (if it had worked).
@tdverstynen @kordinglab
I was concerned enough to contact the journal, who reassured me there was ethics approval. I doubt this would get past Oxford ethics cttee though.
@kordinglab @tdverstynen Awesome, I don't suppose you have an open version link at hand?
@debivort @kordinglab Not one I can *publicly* share. DM me?
@tdverstynen @kordinglab your website's not loading for me, but I'm debivort [at] oeb.harvard.edu — would you mind sending it my way?
@tdverstynen @kordinglab Cheers for messaging me the arxiv version
@kordinglab @tdverstynen Thanks, I know this is not the main problem with the paper, but I didn't realize leave-one-out as a cross-validation strategy had these issues. I guess the idea is that it is more important to leave out larger chunks of correlated input samples. I don't fully understand how random selection of 20% of the data is likely to achieve this. I need to read more about it in the paper from Varoquaux et al that was linked https://doi.org/10.1016/j.neuroimage.2016.10.038
@chrisXrodgers @kordinglab We cite the paper showing the high false positive rates with LOOCV in our piece. Though I don’t think that was the main problem here, it could have amplified the error.

@kordinglab @tdverstynen

HT for such scientific skepticism display!

Reminded me of:

• "Attack Mannequins: AI as Propaganda" by @dwaynemonroe

https://monroelab.net/attack-mannequins-ai-as-propaganda

• "New Laws of Robotics: Defending Human Expertise in the Age of AI" by @FrankPasquale

https://www.sydney.edu.au/engage/events-sponsorships/sydney-ideas/2022/frank-pasquale-how-ai-is-changing-medical-practice.html

• "I’m an ER doctor: Here’s what I found when I asked ChatGPT to diagnose my patients"
by Dr. Josh Tamayo-Sarver

https://inflecthealth.medium.com/im-an-er-doctor-here-s-what-i-found-when-i-asked-chatgpt-to-diagnose-my-patients-7829c375a9da

Attack Mannequins: AI as Propaganda – Computational Impacts

@kordinglab @tdverstynen

Dear Konrad Kording,

thank you very much for your very interesting reference and link, I had a look at this research once.

@kordinglab @tdverstynen

Yes, that's true the topic "brain-reading" is currently very "hyped" also here in Germany. Prof. Dr. John-Dylan Haynes, whom you may know, from the Center for Advanced Neuroimaging at the Charité in Berlin is researching this topic very intensively. He is reportedly now able to tell from subjects' fMRI whether they are currently thinking about their grandmother or grandfather.

@kordinglab @tdverstynen
I don't know exactly the test design and the data material for this experiment either ;-).

A relatively well documented study that you may also be aware of is "AI re-creates what people see by reading their brain scans" (7 Mar 2023) https://www.science.org/content/article/ai-re-creates-what-people-see-reading-their-brain-scans.

AI re-creates what people see by reading their brain scans

A new artificial intelligence system can reconstruct images a person saw based on their brain activity

@kordinglab @tdverstynen

But here only one subject was scanned for 1 year and a huge amount of data was generated. Okay, the result in the "image comparison" is not bad, but still far away from something like "real brain reading", because only a structural correlation between perceptual input and neuronal activity pattern has been established. Thus, it is miles away from stating "suicidal thoughts" in a random person.

@kordinglab @tdverstynen
I would also not know which general neuronal activity pattern would have "suicidal thoughts".

Thank you for your interesting input and
best regards

Philo Sophies