What they’re saying, as far as I can tell, is that after training the model on 85% of the dataset, the model predicted whether a participant had an ASD diagnosis (as a binary choice) 100% correctly for the remaining 15%. I don’t think this is unheard of, but I’ll agree that a replication would be nice to eliminate systemic errors. If the images from the ASD and TD sets were taken with different cameras, for instance, that could introduce an invisible difference in the datasets that an AI could converge on. I would expect them to control for stuff like that, though.
You need to report two numbers for a classifier, though. I can create a classifier that catches all cases of autism just by saying that everybody has autism. You also need a false positive rate.
True, but as far as I can tell the AUROC measure they refer to incorporates both.
Yup, you’re right, good catch 🙂