We were discussing cross-validation estimates of model performance recently with colleagues, and I dug a bit in the literature to better understand where we're at.

This is not my topic of expertise, but here are a few tidbits I'd like to share.

1) cross-validation has been the topic of much discussion for many decades. Stone (1974) https://www.jstor.org/stable/2984809 gives a good overview of what precedes. ­

⬇️

#machineLearning #statistics #crossValidation

⬆️

Reading the discussion of the paper by other statisticians is enlightening as to how the tone of scientific discourse has mercifully changed in 50 years.

Also, "The term 'assessment' is preferred to 'validation' which has a ring of excessive confidence about it."

⬇️

#machineLearning #statistics #crossValidation

⬆️

2) (not a surprise, but worth remembering): cross-validation error bars can be very large when sample sizes are small (unsurprisingly, due to the \( \frac{1}{\sqrt{n}} \) factor).

This is discussed for example regarding microarray studies in Braga-Neto & Dougherty (2004) https://doi.org/10.1093/bioinformatics/btg419 and @GaelVaroquaux (2018) regarding brain image analysis https://doi.org/10.1016/j.neuroimage.2017.06.061

⬇️

#machineLearning #statistics #crossValidation

Is cross-validation valid for small-sample microarray classification?

Abstract. Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remark

OUP Academic

⬆️

3) cross-validation estimators are better estimators of *expected test error* (across all possible training sets) than of *generalization error* of a model.

This has been known for a while and even appears in The Elements of Statistical Learning, so I should have known about this much earlier. Bates et al. (2023) https://doi.org/10.1080/01621459.2023.2197686 show why this is for linear models.

⬇️

#machineLearning #statistics #crossValidation

⬆️

4) in any case, error bars are wrong, because it's impossible to get an unbiased estimator of the mean squared error of an estimator that's based on a single fold of cross-validation, as shown by Bengio & Grandvalet (2004) https://dl.acm.org/doi/10.5555/1005332.1044695

⬇️

#machineLearning #statistics #crossValidation

No Unbiased Estimator of the Variance of K-Fold Cross-Validation | The Journal of Machine Learning Research

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing ...

The Journal of Machine Learning Research

⬆️

5) Bates et al. (2023) https://doi.org/10.1080/01621459.2023.2197686 propose a nested cross-validation estimator of generalization error that's unbiased and has an unbiased mean squared error estimator. It's computationally quite intensive. I played a bit with it, and my in high-dimensional set ups (large p small n) I got error bars that had indeed good coverage of the generalization error, but were also covering most of the [0, 1] interval, which is less helpful.

⬇️

#machineLearning #statistics #crossValidation

⬆️

6) thankfully, Wager (2020) https://doi.org/10.1080/01621459.2020.1727235 shows that cross-validation is asymptotically consistant for model selection, so while what we're doing gives us poor estimates of generalization error and bad error bars, at least it's valid for model selection.

#machineLearning #statistics #crossValidation

@cazencott thank you for this thread!
@HydrePrever sure thing! I wish we were more aware of these things in the ML community – but then how could we keep on publishing so many papers with insignificant results?