Mastodawn

Collin Edwards Jul 26, 2023

#rstats question for mgcv

If I want to test whether there is support for a smooth x factor effect, I can just include the smooth and the smooth x factor as separate terms to create nested models, right? As in

library(mgcv)
set.seed(0)
dat<-gamSim(5,n=200,scale=2)
##make explicit factor
dat$fac = factor(letters[dat$x0])

m = gam(y ~ s(x1) + fac + s(x1, by = fac), data = dat)
m1 = gam(y ~ s(x1) + fac, data = dat)
anova(m, m1, test = "F")

@noamross @gavinsimpson

Show thread

Collin Edwards Jul 26, 2023

@noamross @gavinsimpson

It feels like the `by = factor` term is redundant there, and should lead to issues, but I suspect that just means I'm not thinking about shrinkage and wiggliness appropriately. Certainly the code runs that way.

Show thread

Gavin Simpson Jul 27, 2023

@collinedwards @noamross I think in this instance you'd be better off numerically using

y ~ s(x1) + s(x1, fac, bs = "sz")

& compare that with

y ~ s(x1) + fac

as the `sz` basis will set things up so that those difference smooths are orthogonal to the main f(x1) term. This sz basis also includes the group means hence no fac

Fit both models with `method = "ML"` if you are going to attempt a GLRT but do read `?anova.gam` about the multi-model feature of that function

Show thread

Collin Edwards

@gavinsimpson @noamross This is awesome!! Thanks for all the help!

We're already doing comparisons with AIC, and the differences are quite large. I was just thinking providing a reasonable P value might help with the review process. So I'm not super worried that the P values are biased towards being low.