Let's try this again outside of the heat wave and weekend:

I'm looking for help with #rstats, specifically larger #bam / #gam in #mgcv with NCV.

It's running, but it has been about 60h by now, minimal CPU usage, and steadily climbing RAM usage.

Any pointers how to speed things up are welcome!

The #bam in question:

```
gam_roi_treatment_TW <- bam(
cumulative_time_in_roi ~
s(time_point_s, bs = "tp", k = 10) +
s(time_point_s, treatment, bs = "sz", k = 10) +
s(time_point_s, sex, bs = "sz", k = 10) +
s(time_point_s, strain, bs = "sz", k = 10) +
s(time_point_s, treatment, sex, bs = "sz", k = 10) +
s(time_point_s, treatment, strain, bs = "sz", k = 10) +
s(time_point_s, sex, strain, bs = "sz", k = 10) +
s(time_point_s, treatment, sex, strain, bs = "sz", k = 10) +
s(age_baseline, bs = "tp", k = 10) +
s(time_point_s, ID, bs = "fs", k = 10),
data = roi_no_talad,
family = tw(),
select = FALSE,
method = "NCV",
nei = roi_notalad_nei,
control = gam.control(ncv.threads = N_THREADS)
)
```

#mgcvchat

https://fediscience.org/@volephd/116821580992514861

Thorbjörn Sievert (@[email protected])

Question for the #stats folks working with #mgcv and #gam / #bam: I'm dealing with a dataset with a high temporal autocorrelation, so after a lengthy discussion with the author of `gratia` (my current PI), we concluded that I should move away from AR(1) models and rather work with #NCV. No problems with the setup, but the run time is forever. My first model has been running for 20h by now, with no indication of how long it will take. I have already switched over to #OpenBLAS, but it seems weird that my CPU load is constantly at only 15-25%. The only indication that something is happening is that RAM usage is slowly but steadily increasing. I'm dealing with ~40k data points across ~300 time series, and a NCV window of 7 data points, so nothing crazy. The model includes two- and three-way interactions, and random smooths, but all of those things are biologically relevant. I'd happily take any suggestions on how to speed things up, how to get proper CPU usage, or at least get an estimate of how far along the calculations are. I might eventually need to switch over to #twlss, from the current `tw()`, and that will make things worse as that is not supported by `bam()`. I'm currently on Windows, but I might be able to run things on an #HPC if needed, but not sure if I can easily fiddle with #BLAS there.

FediScience.org

Question for the #stats folks working with #mgcv and #gam / #bam:
I'm dealing with a dataset with a high temporal autocorrelation, so after a lengthy discussion with the author of `gratia` (my current PI), we concluded that I should move away from AR(1) models and rather work with #NCV.
No problems with the setup, but the run time is forever.
My first model has been running for 20h by now, with no indication of how long it will take.
I have already switched over to #OpenBLAS, but it seems weird that my CPU load is constantly at only 15-25%. The only indication that something is happening is that RAM usage is slowly but steadily increasing.
I'm dealing with ~40k data points across ~300 time series, and a NCV window of 7 data points, so nothing crazy. The model includes two- and three-way interactions, and random smooths, but all of those things are biologically relevant.

I'd happily take any suggestions on how to speed things up, how to get proper CPU usage, or at least get an estimate of how far along the calculations are.

I might eventually need to switch over to #twlss, from the current `tw()`, and that will make things worse as that is not supported by `bam()`.

I'm currently on Windows, but I might be able to run things on an #HPC if needed, but not sure if I can easily fiddle with #BLAS there.

#Day28 | Uncertainties – Modeling | #30DayChartChallenge | Barro Colorado Island — Tree Species Richness Estimation. Built with #RStats using #ggplot2, #patchwork, #MASS, #mgcv, #scales, #vegan, #gridExtra and #grid.

The hottest ticket in R will be @gavinsimpson's live stream on What's New in Generalized Additive Models in R

2026-03-06 (17:00–19:00 CET) at https://youtube.com/live/A9U8e1KdlU4?feature=share

• what GAMs are and how they work
• recent {mgcv} updates (incl. Hierarchical GAMs)
• new features in {gratia}
• deeper inference with {marginaleffects}

Post questions at https://github.com/gavinsimpson/gratia/discussions/categories/q-a?discussions_q=is%3Aopen+category%3AQ%26A+label%3Alivestream

#RStats #mgcv #gratia #statistics #GAMs

What's new in the world of Generalized Additive Models

YouTube

📈 Yes you can do that in mgcv update

big thanks to Zachary Susswein for spotting that my code was out of date in my neighbourhood cross-validation examples: https://calgary.converged.yt/articles/ncv.html https://calgary.converged.yt/articles/ncv_timeseries.html

They are now up-to-date, as is the helper package mgcvUtils: https://github.com/dill/mgcvUtils

#mgcvchat #mgcv

Neighbourhood cross-validation – Yes! You can do that in mgcv!

Anyone got anything on using #mgcv with #mrf and #sf objects in #rstats? The package seems to want its own format for polygon regions and (can) compute its own adjacency list etc. But I haz sf objects...

#quarto #rstats friends who use github action to publish articles:

it's currently taking github actions ~30 mins to publish my little #mgcv help site (https://calgary.converged.yt/). This seems to be because it's installing a lot of R packages from source.

What's the current state-of-the-art to get these things to render quickly? (And using minimal power.)

(I'd like to not use github but I would also like to encourage PRs etc from folks without a huge overhead from them, so let's stick to github-based solutions for now.)

Yes! You can do that in mgcv – Yes! You can do that in mgcv!

new (out for a while but sitting in my browser from before Christmas) paper in Biometrika from Benjamin Säfken, Thomas Kneib and Simon Wood on smoothing parameter degrees of freedom

Green OA @ Edinburgh https://www.pure.ed.ac.uk/ws/portalfiles/portal/475921820/asae052.pdf

#mgcvchat #mgcv

#mgcv mini-lifehack:

(assuming you have multithreading enabled) you can get a rough idea of what's happening when fitting a big model by looking at your CPU usage. If only 1 core is being used, the model is still "building" (assembling of design/penalty matrices), once you switch to all cores, then you're actually fitting the model. Sometimes that first model construction phase can take a long time (with a very big model), so it'll probably take a very very long time to fit. So buckle-up.

#mgcvchat

#Poisson regression with #mgcv and #glmmTMB in #rstats just rocks