`in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization.`

https://arxiv.org/abs/2505.20235

#ML #MachineLearning #OptimizationTheory #theory #math

Variational Deep Learning via Implicit Regularization

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deploying deep learning models out-of-distribution, in sequential decision-making tasks, or in safety-critical domains, necessitates reliable uncertainty quantification, not just a point estimate. The machinery of modern approximate inference -- Bayesian deep learning -- should answer the need for uncertainty quantification, but its effectiveness has been challenged by our inability to define useful explicit inductive biases through priors, as well as the associated computational burden. Instead, in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization. Finally, we show empirically that our approach achieves strong in- and out-of-distribution performance without tuning of additional hyperparameters and with minimal time and memory overhead over standard deep learning.

arXiv.org

`We show that the Weierstrass method, like the well known #Newton method, is not generally convergent: there are open sets of #polynomials p of every degree d≥3 such that the dynamics of the Weierstrass method applied to p exhibits attracting periodic orbits.`

https://arxiv.org/abs/2004.04777

#rootFinding #optimizationTheory #optimization #computation #computing

`As in #biological #bee colonies, a small number of scouts keeps exploring the solution space looking for new regions of high fitness (global search). The global #search procedure re-initialises the last ns-nb #flower patches with randomly generated solutions.`

https://en.wikipedia.org/wiki/Bees_algorithm

#optimization #optimizationTheory #globalOptimization #algorithm #algorithms #searchAlgorithm #algorithmicSearch

Bees algorithm - Wikipedia

`According to George Dantzig, the duality theorem for linear optimization was conjectured by John von Neumann immediately after Dantzig presented the linear programming problem. Von Neumann noted that he was using information from his game theory, and conjectured that two person zero sum matrix game was equivalent to linear programming. Rigorous proofs were first published in 1948 by Albert W. Tucker and his group`

https://en.wikipedia.org/wiki/Duality_(optimization)

#duality #optimization #optimizationTheory #ML #stats

Duality (optimization) - Wikipedia