Mastodawn

Rémi Dec 21, 2022

I drafted an implementation of Cyclical SGLD using Blackjax and Optax.

As you can see 👇 Cyclical SGLD, alternating exploration and sampling phases, is much better on multi-modal targets than vanilla SGLD. Next step: CIFAR-10 with a Bayesian Resnet18.

https://www.thetypicalset.com/blog/cyclical_sgld.html

Cyclical SGLD in Blackjax

Show thread

Seth Axen 🪓

Dec 21, 2022

@remilouf I haven't used SGLD before. It's attractive that it targets the posterior, but I wonder how that's used in practice. e.g. training a neural net, would one save the weights at regular intervals and then evaluate an ensemble of nets on an input to get a sample of outputs?

Show thread

Rémi Dec 21, 2022

@sethaxen That’s my understanding of how people do it. We have an example classifier that we train on MNIST in the Blackjax documentation: https://blackjax-devs.github.io/blackjax/examples/SGMCMC.html

(currently broken because of a timeout in CI, need to fix it)

MNIST Digit Recognition With a 3-Layer Perceptron — Blackjax

Show thread

Rémi Dec 21, 2022

@sethaxen One day someone asked me why you needed a Bayesian net to estimate classification uncertainty: so I took some time to think about it: https://github.com/rlouf/ama/discussions/4

Why do I need a bayesian neural net to estimate classification uncertainty? · Discussion #4 · rlouf/ama

I’ve a neural net and want to measure its uncertainty on classification, currently I just use probability of top class as a proxy, how would a Bayesian neural net change that?

GitHub

Show thread

Thomas Wiecki Dec 22, 2022

@remilouf

@sethaxen @larryshamalama Unfortunately, in Bayesian NNs uncertainty increases as you move further away from the decision boundary, not the training data distribution which is usually the type of uncertainty people are expecting.

Show thread

Steven Atkinson Dec 22, 2022

@twiecki @remilouf @sethaxen @larryshamalama under what conditions does this happen? Interested to learn about this if you can point me at papers?

Show thread

Thomas Wiecki Dec 22, 2022

@Sdatkinson @remilouf @sethaxen @larryshamalama This happens under all conditions, it's a property of the model. It only knows about the parameters that describe the hyper-plane, not the data-generating process. You can see the effect clearly here https://twiecki.io/blog/2016/06/01/bayesian-deep-learning/#Uncertainty-in-predicted-value Why is there no uncertainty near the edges?

Bayesian Deep Learning — While My MCMC Gently Samples

Show thread

Steven Atkinson Dec 22, 2022

@twiecki @remilouf @sethaxen @larryshamalama Thanks! I'll check this out.

Show thread

Steven Atkinson

@twiecki @remilouf @sethaxen @larryshamalama

Hmm, perhaps a typo? Uncertainty seems _highest_ at the decision boundary in the example and _decreases_ as you move away from it.

Show thread

Thomas Wiecki Dec 23, 2022

@Sdatkinson @remilouf @sethaxen @larryshamalama oh yes, sorry for the confusion.

Show thread

Steven Atkinson Dec 23, 2022

@twiecki @remilouf @sethaxen @larryshamalama Ah, ok. *Much* less surprising then! 😅