ICYMI this paper really is a must read: https://doi.org/10.1111/ele.14033

Comparing and choosing best models based on AIC etc, then interpret coefficients causally (what's the effect of X on Y?) is flawed, yet so common

We must draw causal assumptions first (DAG)

"Model selection is not a valid method for inferring causal relationships. It's appropriate for predictive inference (which model best predicts Y?), which is fundamentally distinct from causal inference (what is the effect of X on Y?)"

1/

Imagine we want to assess the effect of 'Forestry' on 'Species Y'. But we know other things may also affect Y

We could put all these variables in a regression model (what R. McElreath calls a causal salad), or build models w/ different subsets of predictors and compare them.

That will lead to biased estimates. The best model based on AIC & BIC includes more predictors and gives biased estimate of Forestry effect on Y

The causal model (based on DAG) has much larger AIC but gives correct estimate

This applies to Machine Learning too (random forests etc). Showing high variable importance does not mean those predictors are important from a causal point of view, only that they are useful to get good predictions

Causal inference is rarely taught, yet seems so important. Many papers do not aim to predict but to make inferences about how important different variables are. It seems we're too often using a wrong approach

I'm trying to learn more about this. Next on my reading list: https://doi.org/10.1002/ecm.1554

@frod_san Oh, what a great paper, and a nice companion to this @bbolker piece: https://github.com/bbolker/discretization/blob/master/outputs/discrete.pdf
discretization/outputs/discrete.pdf at master · bbolker/discretization

opinion piece on discretization and multimodel averaging in ecological statistics - bbolker/discretization

GitHub
@noamross @frod_san FWIW I may actually get this submitted/posted to a preprint server in the near(ish) future ...
@bbolker 🥳🙏
@noamross Your wish is my command: https://ecoevorxiv.org/repository/view/5722/ Hoping to submit to *Methods in Eco/Evo* v soon, unless someone tells me that Wiley/MEE are on the Naughty List now ... what's *not* in here is a bunch of my own simulations to evaluate MMA coverage in different scenarios - I relied on the large number of existing studies that look at this (interestingly Burnham and Anderson are the *only* ones I can find who report good coverage from MMA CIs ...)
Multimodel approaches are not the best way to understand multifactorial systems