Olivier Grisel

@ogrisel
6 Followers
682 Following
131 Posts
Software Engineer at Inria, scikit-learn developer supported by http://scikit-learn.fondation-inria.fr. Mostly posts about #Python, #Pydata, #MachineLearning & #DeepLearning.
githubhttps://github.com/ogrisel
twitterhttps://twitter.com/ogrisel
scikit-learnhttps://scikit-learn.org

[ANN]: to avoid typo-squatters on PyPI we uploaded a magical "sklearn" package with a single dependency to the actual "scikit-learn" years ago.

However many people are confused by this magic package alias and its fixed 0.0 version number.

Next month, this alias will progressively "brown-out": "pip install sklearn" will fail at some predictable times with an informative error message to tell the user to use "pip install scikit-learn" instead.

https://github.com/scikit-learn/sklearn-pypi-package#goal

#sklearn #pydata #scipy

GitHub - scikit-learn/sklearn-pypi-package

Contribute to scikit-learn/sklearn-pypi-package development by creating an account on GitHub.

GitHub

They show that the inductive bias of the NGD solution is favorable when the training data has significant label noise or arguably equivalently when the problem is misspecified (e.g. fitting a non-linear target with a linear least squares model).

GD is better than NGD when the amount of label noise or non-linearity is lower.

This behavior seems to transfer empirically to neural network training with label noise.

(2/2)

📜 Interesting #machinelearning paper:

When Does Preconditioning Help or Hurt Generalization?
by Amari et al.

https://arxiv.org/abs/2006.10732

They study the inductive bias of gradient descent on the over-parametrized linear least squares problem.

While traditional gradient descent converges to the minimum Euclidean norm solution, Natural Gradient Descent converges to the minimum Mahalanobis norm solution induced by the inverse Fisher information matrix.

👇 (1/2)

When Does Preconditioning Help or Hurt Generalization?

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question. This work presents a more nuanced view on how the \textit{implicit bias} of first- and second-order methods affects the comparison of generalization properties. We provide an exact asymptotic bias-variance decomposition of the generalization error of overparameterized ridgeless regression under a general class of preconditioner $\boldsymbol{P}$, and consider the inverse population Fisher information matrix (used in NGD) as a particular example. We determine the optimal $\boldsymbol{P}$ for both the bias and variance, and find that the relative generalization performance of different optimizers depends on the label noise and the "shape" of the signal (true parameters): when the labels are noisy, the model is misspecified, or the signal is misaligned with the features, NGD can achieve lower risk; conversely, GD generalizes better than NGD under clean labels, a well-specified model, or aligned signal. Based on this analysis, we discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD. We then extend our analysis to regression in the reproducing kernel Hilbert space and demonstrate that preconditioned GD can decrease the population risk faster than GD. Lastly, we empirically compare the generalization error of first- and second-order optimizers in neural network experiments, and observe robust trends matching our theoretical analysis.

arXiv.org

scikit-learn 1.2 will come with a new exact Newton/IRLS solver for binary logistic regression and Poisson/Gamma/Tweedie regression.

It can be much faster than LBFGS on datasets with n_samples>>n_features, in particular with sparse features (e.g. one-hot encoded categorical variables) with different rarity scales.

Full details in the following two PRs:

- https://github.com/scikit-learn/scikit-learn/pull/24767

- https://github.com/scikit-learn/scikit-learn/pull/24637

#pydata #scipy #ml #sklearn

ENH add newton-cholesky solver to LogisticRegression by lorentzenchr · Pull Request #24767 · scikit-learn/scikit-learn

Reference Issues/PRs Completes #16634. Follow-up of #24637. What does this implement/fix? Explain your changes. This adds the solver "newton-cholesky" to the classes LogisticRegression and Logistic...

GitHub

The Kaggle 2022 #datascience and #machinelearning survey results our out:

https://www.kaggle.com/kaggle-survey-2022

It's nice to see #Python and scikit-learn so strong! I would never have expected having such an impact when joining the project 12 years ago.