made some updates last week to my GAM blog: adaptive smoothing, now with plots of the smoothing parameter function
big thanks to Philip Dixon who asked an interesting question!
made some updates last week to my GAM blog: adaptive smoothing, now with plots of the smoothing parameter function
big thanks to Philip Dixon who asked an interesting question!
📈 Yes you can do that in mgcv update
big thanks to Zachary Susswein for spotting that my code was out of date in my neighbourhood cross-validation examples: https://calgary.converged.yt/articles/ncv.html https://calgary.converged.yt/articles/ncv_timeseries.html
They are now up-to-date, as is the helper package mgcvUtils: https://github.com/dill/mgcvUtils
new (out for a while but sitting in my browser from before Christmas) paper in Biometrika from Benjamin Säfken, Thomas Kneib and Simon Wood on smoothing parameter degrees of freedom
Green OA @ Edinburgh https://www.pure.ed.ac.uk/ws/portalfiles/portal/475921820/asae052.pdf
#mgcv mini-lifehack:
(assuming you have multithreading enabled) you can get a rough idea of what's happening when fitting a big model by looking at your CPU usage. If only 1 core is being used, the model is still "building" (assembling of design/penalty matrices), once you switch to all cores, then you're actually fitting the model. Sometimes that first model construction phase can take a long time (with a very big model), so it'll probably take a very very long time to fit. So buckle-up.
oh, hey, I reviewed this! gratia is an excellent tool for mgcv users! Thanks @gavinsimpson!
Just published in JOSS: 'gratia: An R package for exploring generalized additive models' https://doi.org/10.21105/joss.06962
Generalized additive models (GAMs) are a commonly used, flexible framework applied to many problems in statistical ecology. GAMs are often considered to be a purely frequentist framework (`generalized linear models with wiggly bits'), however links between frequentist and Bayesian approaches to these models were highlighted early on in the literature. Bayesian thinking underlies many parts of the implementation in the popular R package \texttt{mgcv} as well as in GAM theory more generally. This article aims to highlight useful links (and differences) between Bayesian and frequentist approaches to smoothing, and their practical applications in ecology (with an \texttt{mgcv}-centric viewpoint). Here I give some background for these results then move onto two important topics for quantitative ecologists: term/model selection and uncertainty estimation.
my mgcv Wrapped 2024
top 5 basis functions:
1. thin-plate regression splines
2. B-splines
3. soap film smoother
4. cubic cyclic splines
5. random effects (psych!)
spending some more time thinking about neighbourhood cross-validation in #mgcv (see original post here: https://calgary.converged.yt/articles/ncv.html), but for time series.
Pretty nice to be able to get back to a yearly trend here without needing to specify an autoregressive structure. We just need to specify a cross-validation scheme and the autocorrelation is "dealt with" during fitting.
I've been writing-up some bits on un/under-documented parts of mgcv. Here's a bit of chat about the new "neighbourhood cross-validation" method that was uploaded to arXiv a wee while ago: https://calgary.converged.yt/articles/ncv.html
More to come on this, including some details on how to setup neighbourhoods in practice.
(Please @ me with errors/typos etc)
Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: https://arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).
See ?NCV in latest {mgcv} for examples (https://cran.r-universe.dev/mgcv/doc/manual.html#NCV)
I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.
#rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl
Many varieties of cross validation would be statistically appealing for the estimation of smoothing and other penalized regression hyperparameters, were it not for the high cost of evaluating such criteria. Here it is shown how to efficiently and accurately compute and optimize a broad variety of cross validation criteria for a wide range of models estimated by minimizing a quadratically penalized loss. The leading order computational cost of hyperparameter estimation is made comparable to the cost of a single model fit given hyperparameters. In many cases this represents an $O(n)$ computational saving when modelling $n$ data. This development makes if feasible, for the first time, to use leave-out-neighbourhood cross validation to deal with the wide spread problem of un-modelled short range autocorrelation which otherwise leads to underestimation of smoothing parameters. It is also shown how to accurately quantifying uncertainty in this case, despite the un-modelled autocorrelation. Practical examples are provided including smooth quantile regression, generalized additive models for location scale and shape, and focussing particularly on dealing with un-modelled autocorrelation.