Mastodawn

Nov 2, 2022

Things are coming together for #ArviZ's InferenceData (https://github.com/arviz-devs/InferenceObjects.jl) to be a supported output type for #Turing and #JuliaLang's #Stan interface, similarly to how it is for #PyMC.

For details, see https://github.com/TuringLang/MCMCChains.jl/issues/381 and https://github.com/StanJulia/StanSample.jl/issues/60

#statistics #mcmc_stan #bayesian

GitHub - arviz-devs/InferenceObjects.jl: Storage for results of Bayesian inference

Storage for results of Bayesian inference. Contribute to arviz-devs/InferenceObjects.jl development by creating an account on GitHub.

GitHub

Show thread

Seth Axen 🪓

Nov 2, 2022

This has some really nice benefits for #Bayesian folks in #JuliaLang.

First, InferenceData is just more useful than MCMCChains.Chains because it contains more data, preserves the array structure of the draws, and integrates better with the ecosystem thanks to DimensionalData. It also follows a multi-language spec, so it's great for long-term storage and communication of inference results.

Show thread

Cameron Pfiffer Nov 2, 2022

@sethaxen honestly this is the big benefit for me -- MCMCChains throws out so much information when you make the chain and it angers me. Of course, it's largely my fault, but I'm glad you're doing something about it.

Show thread

Seth Axen 🪓

Nov 2, 2022

@cameron One downside to InferenceData is that everything still needs to be a numeric array at some point (whether for plotting, diagnostics, statistics, or serialization), so even when we do have support for sampling Cholesky in Turing, it may not fit nicely into an InferenceData-based workflow without some loss of structure. But right now this is the exception, not the norm, and we have ideas we can try, so either way this is progress.

Show thread

Cameron Pfiffer Nov 2, 2022

@sethaxen We were dealing with this problem too -- the numeric type thing is a massive pain in the ass. I think we discussed using tuples or vectors-of-vectors, but never really got around to implementing it.

Show thread

Seth Axen 🪓

Nov 2, 2022

@cameron I think what the TransformVariables family of packages and @cscherrer's SampleChains returns (either NamedTuple of vectors or vectors of NamedTuples or one that looks like the other) is the right thing for Julia types and for downstream tasks like conditioning or resampling. But these are not the best formats for other analyses like computing ESS or plotting, where you need real marginals. So I suspect both formats are needed, or ways to trivially interconvert.

Show thread

Chad Scherrer Nov 2, 2022

@sethaxen @cameron I still don't understand these points.
- computing ESS
- plotting
- "real marginals"

I don't see obstacles to the first two, and I'm not sure what you mean by the third. Is there an example somewhere of something that's hard to do?

Show thread

Cameron Pfiffer Nov 2, 2022

@cscherrer I think it's hard to do in a "dumb" way because you have to be aware of the underlying data structure or (generally) provide some mapping function. You're right that it's not difficult, but it is an additional layer of complexity that can be difficult to do well generally.

Show thread

Chad Scherrer

@cameron Sorry, I still don't get it. Any variable in the posterior is just a value. It's passed around as a value, and called in a very natural way. How does that make anything hard?

Storing everything as a flattened array seems a lot harder to me.

Show thread

Chad Scherrer Nov 2, 2022

@cameron The one case I *can* see as being weird is if you have functions that require an array. But this is exactly what comes up in HMC, and TransformVariables seems to solve the problem just fine