Let's try this again outside of the heat wave and weekend:

I'm looking for help with #rstats, specifically larger #bam / #gam in #mgcv with NCV.

It's running, but it has been about 60h by now, minimal CPU usage, and steadily climbing RAM usage.

Any pointers how to speed things up are welcome!

The #bam in question:

```
gam_roi_treatment_TW <- bam(
cumulative_time_in_roi ~
s(time_point_s, bs = "tp", k = 10) +
s(time_point_s, treatment, bs = "sz", k = 10) +
s(time_point_s, sex, bs = "sz", k = 10) +
s(time_point_s, strain, bs = "sz", k = 10) +
s(time_point_s, treatment, sex, bs = "sz", k = 10) +
s(time_point_s, treatment, strain, bs = "sz", k = 10) +
s(time_point_s, sex, strain, bs = "sz", k = 10) +
s(time_point_s, treatment, sex, strain, bs = "sz", k = 10) +
s(age_baseline, bs = "tp", k = 10) +
s(time_point_s, ID, bs = "fs", k = 10),
data = roi_no_talad,
family = tw(),
select = FALSE,
method = "NCV",
nei = roi_notalad_nei,
control = gam.control(ncv.threads = N_THREADS)
)
```

#mgcvchat

https://fediscience.org/@volephd/116821580992514861

Thorbjörn Sievert (@[email protected])

Question for the #stats folks working with #mgcv and #gam / #bam: I'm dealing with a dataset with a high temporal autocorrelation, so after a lengthy discussion with the author of `gratia` (my current PI), we concluded that I should move away from AR(1) models and rather work with #NCV. No problems with the setup, but the run time is forever. My first model has been running for 20h by now, with no indication of how long it will take. I have already switched over to #OpenBLAS, but it seems weird that my CPU load is constantly at only 15-25%. The only indication that something is happening is that RAM usage is slowly but steadily increasing. I'm dealing with ~40k data points across ~300 time series, and a NCV window of 7 data points, so nothing crazy. The model includes two- and three-way interactions, and random smooths, but all of those things are biologically relevant. I'd happily take any suggestions on how to speed things up, how to get proper CPU usage, or at least get an estimate of how far along the calculations are. I might eventually need to switch over to #twlss, from the current `tw()`, and that will make things worse as that is not supported by `bam()`. I'm currently on Windows, but I might be able to run things on an #HPC if needed, but not sure if I can easily fiddle with #BLAS there.

FediScience.org

made some updates last week to my GAM blog: adaptive smoothing, now with plots of the smoothing parameter function

https://calgary.converged.yt/articles/adaptive_smoothing.html#bonus-plotting-the-smoothing-parameter-over-the-covariate

big thanks to Philip Dixon who asked an interesting question!

#rstats #mgcvchat

Adaptive smoothing in mgcv – Yes! You can do that in `mgcv`!

📈 Yes you can do that in mgcv update

big thanks to Zachary Susswein for spotting that my code was out of date in my neighbourhood cross-validation examples: https://calgary.converged.yt/articles/ncv.html https://calgary.converged.yt/articles/ncv_timeseries.html

They are now up-to-date, as is the helper package mgcvUtils: https://github.com/dill/mgcvUtils

#mgcvchat #mgcv

Neighbourhood cross-validation – Yes! You can do that in mgcv!

new (out for a while but sitting in my browser from before Christmas) paper in Biometrika from Benjamin Säfken, Thomas Kneib and Simon Wood on smoothing parameter degrees of freedom

Green OA @ Edinburgh https://www.pure.ed.ac.uk/ws/portalfiles/portal/475921820/asae052.pdf

#mgcvchat #mgcv

#mgcv mini-lifehack:

(assuming you have multithreading enabled) you can get a rough idea of what's happening when fitting a big model by looking at your CPU usage. If only 1 core is being used, the model is still "building" (assembling of design/penalty matrices), once you switch to all cores, then you're actually fitting the model. Sometimes that first model construction phase can take a long time (with a very big model), so it'll probably take a very very long time to fit. So buckle-up.

#mgcvchat

oh, hey, I reviewed this! gratia is an excellent tool for mgcv users! Thanks @gavinsimpson!

https://fosstodon.org/@joss/113691283592026369

#mgcvchat

JOSS (@[email protected])

Just published in JOSS: 'gratia: An R package for exploring generalized additive models' https://doi.org/10.21105/joss.06962

Fosstodon
Bayesian views of generalized additive modelling

Generalized additive models (GAMs) are a commonly used, flexible framework applied to many problems in statistical ecology. GAMs are often considered to be a purely frequentist framework (`generalized linear models with wiggly bits'), however links between frequentist and Bayesian approaches to these models were highlighted early on in the literature. Bayesian thinking underlies many parts of the implementation in the popular R package \texttt{mgcv} as well as in GAM theory more generally. This article aims to highlight useful links (and differences) between Bayesian and frequentist approaches to smoothing, and their practical applications in ecology (with an \texttt{mgcv}-centric viewpoint). Here I give some background for these results then move onto two important topics for quantitative ecologists: term/model selection and uncertainty estimation.

arXiv.org

my mgcv Wrapped 2024

top 5 basis functions:

1. thin-plate regression splines
2. B-splines
3. soap film smoother
4. cubic cyclic splines
5. random effects (psych!)

#mgcvchat

spending some more time thinking about neighbourhood cross-validation in #mgcv (see original post here: https://calgary.converged.yt/articles/ncv.html), but for time series.

Pretty nice to be able to get back to a yearly trend here without needing to specify an autoregressive structure. We just need to specify a cross-validation scheme and the autocorrelation is "dealt with" during fitting.

Full post on this soon. #mgcvchat #rstats

Neighbourhood cross-validation – Yes! You can do that in mgcv!

I've been writing-up some bits on un/under-documented parts of mgcv. Here's a bit of chat about the new "neighbourhood cross-validation" method that was uploaded to arXiv a wee while ago: https://calgary.converged.yt/articles/ncv.html

More to come on this, including some details on how to setup neighbourhoods in practice.

(Please @ me with errors/typos etc)

#mgcvchat

Neighbourhood cross-validation – Yes! You can do that in mgcv!