Cross-Validation là gì trong Machine Learning? A-Z

Cross-Validation là một kỹ thuật then chốt trong Machine Learning, giúp kiểm tra hiệu suất và khả năng tổng quát của mô hình trên dữ liệu mới. Nhờ đó, mô hình tránh được tình trạng học lệch và hoạt động ổn định hơn. Bài viết này sẽ cùng bạn khám phá chi tiết về Cross-Validation, lý do nó quan trọng và các phương pháp xác thực phổ biến hiện nay.

Xem chi tiết bài viết tại đây: https://interdata.vn/blog/cross-validation-la-gi/

#interdata #crossvalidation

Guide to Cross-Validation in Machine Learning - NeuralRow - Medium

The basic idea of cross-validation is to split the data first into training and test parts. Then, the training part is further divided into subtrain and validation parts, cycling through these…

Medium
Release 1.0.0 · nf-core/drugresponseeval

What's Changed Important! Template update for nf-core/tools v3.0.1 by @nf-core-bot in #10 Merge branch 'dev' of github.com:nf-core/drugresponseeval into dev by @JudithBernett in #11 Global checkpo...

GitHub

Let's discuss how we can innovate beyond traditional methods to ensure our models truly generalize.

#MachineLearning #DataScience #CrossValidation #TimeSeries #SpatialData

-

#statstab #103 On the marginal likelihood and cross-validation

Thoughts: Can't say I can follow much of this, so I'll open it up to the #bayesian community for input. Seems important though.

#stats #bayes #likelihood #evidence #crossvalidation

https://doi.org/10.1093/biomet/asz077

On the marginal likelihood and cross-validation

Summary. In Bayesian statistics, the marginal likelihood, also known as the evidence, is used to evaluate model fit as it quantifies the joint probability

OUP Academic

⬆️

6) thankfully, Wager (2020) https://doi.org/10.1080/01621459.2020.1727235 shows that cross-validation is asymptotically consistant for model selection, so while what we're doing gives us poor estimates of generalization error and bad error bars, at least it's valid for model selection.

#machineLearning #statistics #crossValidation

⬆️

5) Bates et al. (2023) https://doi.org/10.1080/01621459.2023.2197686 propose a nested cross-validation estimator of generalization error that's unbiased and has an unbiased mean squared error estimator. It's computationally quite intensive. I played a bit with it, and my in high-dimensional set ups (large p small n) I got error bars that had indeed good coverage of the generalization error, but were also covering most of the [0, 1] interval, which is less helpful.

⬇️

#machineLearning #statistics #crossValidation

⬆️

4) in any case, error bars are wrong, because it's impossible to get an unbiased estimator of the mean squared error of an estimator that's based on a single fold of cross-validation, as shown by Bengio & Grandvalet (2004) https://dl.acm.org/doi/10.5555/1005332.1044695

⬇️

#machineLearning #statistics #crossValidation

No Unbiased Estimator of the Variance of K-Fold Cross-Validation | The Journal of Machine Learning Research

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing ...

The Journal of Machine Learning Research

⬆️

3) cross-validation estimators are better estimators of *expected test error* (across all possible training sets) than of *generalization error* of a model.

This has been known for a while and even appears in The Elements of Statistical Learning, so I should have known about this much earlier. Bates et al. (2023) https://doi.org/10.1080/01621459.2023.2197686 show why this is for linear models.

⬇️

#machineLearning #statistics #crossValidation