Philipp Hennig

644 Followers
80 Following
36 Posts
Professor for the Methods of Machine Learning at the University of Tübingen. Co-Director of the ELLIS Program on Theory, Algorithms, Computations.
Bookhttps://www.probabilistic-numerics.org/textbooks/
Websitehttps://uni-tuebingen.de/en/160189
Youtubehttps://youtube.com/@TubingenML
Birdsite@philipphennig5

@emtiyaz This course, and it's Open Educational Resources, brought to you by
Nathanael Bosch, Julia Grosse, Agustinus Kristiadi, Marvin Pförtner, Jonathan Schmidt, Frank Schneider, Lukas Tatzel, Jonathan Wenger, and yours truly.

-- The End (of Term) --

@emtiyaz At the end, I sneak myself on stage on final time to summarize: https://youtu.be/t2CSqdfmKGA

Long story short:

Computation is inference. Whether information is loaded from disk or prepared directly on the GPU is not a fundamental distinction. If we quantify uncertainty everywhere, we can actively control, guide, manage and monitor the use of empirical as well as computational data.

Numerics of ML 14 -- Conclusion -- Philipp Hennig

YouTube

People like @emtiyaz, Alex Immer, Erik Daxberger, Matthias Bauer, Runa Eschenhagen and Agustinus himself have built a beautifully comprehensive framework to turn deep models approximately into Gaussian processes, and thus transfer all the clean mechanisms associated with GPs to deep learning:

Calibrated, learnable uncertainty; out-of-distribution robustness; architecture-optimization by evidence maximization; multi-task, life-long learning, and the list goes on.

This doesn’t mean curvature isn’t interesting, though. Agustinus Kristiadi, in lecture 13 (impersonated imperfectly by myself) explains the beautiful connection between curvature and uncertainty, and introduces linearized Laplace approximations.

https://youtu.be/LssjrrOMlIg

Numerics of ML 13 -- Uncertainty in Deep Learning -- Agustinus Kristiadi

YouTube
Why do we even need these? Lukas Tatzel takes over to make the connection to classic convex optimization. Of course, second-order and superlinear methods (like BFGS) are great. But their advantages are diminished severely in the strongly stochastic setting of most deep learning.
More generally, Frank says, for something done by human engineers, for months on end, the deep training toolchain is surprisingly simplistic. We need a richer software engineering stack for deep models. And he proposes cockpit https://github.com/f-dangel/cockpit by Felix and himself, as inspiration for what such “deepbuggers” for differentiable, array-centric programs might look like.
GitHub - f-dangel/cockpit: Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks - GitHub - f-dangel/cockpit: Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

GitHub
With tools like `backpack.pt` by Felix Dangel and Fred Kuenstner, second-order quantities in the data dimension (batch variances, individual gradients) and the weight dimension (various curvature estimates) are readily available. Why not make use of them in training?

Frank Schneider started the round with a lecture on deep training. He argues that deep learning is only superficially an optimization problem in the strict sense. And that it is a poorly understood problem, despite its enormous emerging economic value.

```
https://youtu.be/PBcVZ5jEE5k
```

Thankfully, there are many leads to a new kind of training; and they are fundamentally probabilistic in nature. The strong stochasticity of mini-batch training should be embraced rather than ignored.

Numerics of ML 11 --Optimization for Deep Learning -- Frank Schneider

YouTube

Just finished uploading “Numerics of Machine Learning”. This final batch is on optimization and deep learning, and more of a list of grievances with the status quo than a collection of solutions.

```
https://www.probabilistic-numerics.org/teaching/2022_Numerics_of_Machine_Learning/
```

Quick thread below.

Probabilistic Numerics | Numerics of Machine Learning

Quantifying Uncertainty in Computation.

Yesterday we had a traditional #Fasching / #Karneval parade in Tübingen. Supposedly there were 40 jester's guilds there. They throw candy and other treats to the kids and prank the bystanders.