Preprint alert: Simulation-based validation of Bayes Factor computation with @paul_buerkner and Sebastian Stroppel. We bring lessons learned in improving SBC to validation of Bayes factors. The main idea is still the same: simulate data from the models, fit those and see if the inferences are calibrated. 1/
https://arxiv.org/abs/2508.11814
#Bayesian #stats #rstats #SBC
First, we highlight a simple, previously unused check: if the posterior model probability is p, the prediction should be correct p of the time. This can be checked with known tools for binary model prediction calibration. We also discuss data-averaged posterior (DAP) check and adapting SBC for the task. 2/
In theory, SBC should be able to discover all possible problems and binary calibration as well as data-averaged posterior may (and do) miss some. In practice, both alternative methods are sometimes more sensitive than SBC for a given simulation budget. We recommend combining all of them. 3/

We also show how to leverage posterior SBC to check computation when priors are improper and we thus cannot simulate (we take the BayesFactor package as an example).

We tried hard to move beyond toy examples and do our evaluations on problems that mimic actual bugs in Bayes Factor computation (omitted normalization constant, improper use of Savage-Dickey density ratio, mismatch between simulation and models. 4/

One general lesson is that most often, apparent problems were in the simulation code and not in the actual BF computation (I in fact e-mailed bridgesampling developers after consistently finding problems in their computation but it later proved to be a bug in my sims). 5/
You may have heard of the previous work by Daniel J. Schad and @ShravanVasishth on the topic (https://arxiv.org/pdf/2103.08744, https://psycnet.apa.org/fulltext/2024-47778-001.html) - where they do only the data-average posterior check. In a recent preprint we collaborated to bring the lessons we learned to the models they find of interest https://arxiv.org/pdf/2406.08022 6/

We do throw some shade on the Good check, but a tweaked version was just published (https://arxiv.org/pdf/2602.19838) which appears to address some of the major problems we report. We didn't manage to include it in our current simulations but we definitely plan to do a head-to-head comparison at some point.

We thank Nikola Sekulovski and @EJWagenmakers for kindly providing feedback on an early version. They also tipped us on the further developments for the Good check. 7/8

My older thread on SBC in general is at https://fediscience.org/@modrak_m/109301406944300548

And there is an R package implementing all of the techniques we discuss: https://github.com/hyunjimoon/SBC/

Martin Modrák (@[email protected])

Attached: 2 images The basic idea is that you implement your model twice: beyond a probabilistic program (e.g. in #Stan, #jags, ...) + a sampling algorithm you also need a simulator drawing from the prior distribution - this tends to be easy to implement. You then simulate multiple datasets, fit those with your probabilistic program and compute ranks of the prior parameter values withing the posterior. If you did everything correct, the ranks are uniform. Non-unifomity then signals a problem. 3/

FediScience.org
@modrak_m @paul_buerkner I was really confused by arXiv id being 2508, and thought this is old news
@avehtari @paul_buerkner It is old news, but I forgot to make an announcement when the first version was up :-D
@modrak_m @avehtari @paul_buerkner This looks very cool!
@modrak_m @avehtari @paul_buerkner I think I caught a bit of shade there for the transparency of the BayesFactor package in the discussion :)
@richarddmorey @avehtari @paul_buerkner The package is still very good though! One thing that made the process a bit challenging was that the posterior draws for coefficients are not the betas themselves but are transformed, but it is not perfectly clear how. So that added another moving part when I tried to make my sims match the package output. But IME this sort of paper vs. package reporting things differently just happens very often with math/stats papers.
@modrak_m @avehtari @paul_buerkner Yeah, there are functions that do the transformations but I assumed no one would be interested in the untransformed ones -
@modrak_m @avehtari @paul_buerkner - and that they might even confuse most people, who wouldn’t know what to do with them