Michael DeWitt

@Medewitt
197 Followers
378 Following
372 Posts
Infectious disease researcher and applied statistician. Doing Bayesian stuff with #rstats, #stan, and #julia. (He/him)
websitehttps://michaeldewittjr.com

Every time I hear about HPV vaccines & cervical cancer I want to shout from the rooftops - it's incredible to live in a time where we can prevent this devastating disease!

But as Kate Cuschieri highlights, we have to tackle this globally & reduce inequity in access! #ESCV2024

My paper with A. Christen (@cimatoficial):

"Dynamic survival analysis: modelling the hazard function via ordinary differential equations"

has been accepted for publication in Statistical Methods in Medical Research.

https://doi.org/10.48550/arXiv.2308.05205

GitHub: https://github.com/FJRubio67/ODESurv

Dynamic survival analysis: modelling the hazard function via ordinary differential equations

The hazard function represents one of the main quantities of interest in the analysis of survival data. We propose a general approach for parametrically modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over time. Our proposal capitalises on the extensive literature of ODEs which, in particular, allow for establishing basic rules or laws on the dynamics of the hazard function via the use of autonomous ODEs. We show how to implement the proposed modelling framework in cases where there is an analytic solution to the system of ODEs or where an ODE solver is required to obtain a numerical solution. We focus on the use of a Bayesian modelling approach, but the proposed methodology can also be coupled with maximum likelihood estimation. A simulation study is presented to illustrate the performance of these models and the interplay of sample size and censoring. Two case studies using real data are presented to illustrate the use of the proposed approach and to highlight the interpretability of the corresponding models. We conclude with a discussion on potential extensions of our work and strategies to include covariates into our framework. Although we focus on examples on Medical Statistics, the proposed framework is applicable in any context where the interest lies on estimating and interpreting the dynamics hazard function.

arXiv.org

My ideal manuscript layout:

[Intro]
This is what I want to know.
Here is what we know so far.
Here is what we don't know yet.
This is what I'm going to do to fill this gap.

[Methods]
Here is what I did.

[Results]
Here is what I found.

[Discussion]
How does this relate to what we know.
How does this resolve what we didn't know.

[Conclusion]
Here is my answer to what I wanted to know.

(you can use the same template for research proposals just swapping out "did" with "will do")

Years ago, I did spend a lot of time working on expectation propagation (EP), and I'm still delighted to see others keeping improving it. "Fearless Stochasticity in Expectation Propagation" paper by Jonathan So and Richard Turner is excellent! https://arxiv.org/abs/2406.01801
Fearless Stochasticity in Expectation Propagation

Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation. They remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks.

arXiv.org

webr-rs-js ๐Ÿค WebR

Running R in the browser via Rust

Rust ๐Ÿ‘ˆ๐Ÿผ๐Ÿ˜Ž๐Ÿ‘‰๐Ÿผ browser
#webr #rust #rstats

PSA: All #rstats package on #cran will get an official DOI!

This will facilitate bibliometrics and giving credit to  package authors.

Registering all 20,000+ packages will still take a few more days. But the first couple of thousand are already live. Example:

Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: https://arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

See ?NCV in latest {mgcv} for examples (https://cran.r-universe.dev/mgcv/doc/manual.html#NCV)

I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

#rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

On Neighbourhood Cross Validation

Many varieties of cross validation would be statistically appealing for the estimation of smoothing and other penalized regression hyperparameters, were it not for the high cost of evaluating such criteria. Here it is shown how to efficiently and accurately compute and optimize a broad variety of cross validation criteria for a wide range of models estimated by minimizing a quadratically penalized loss. The leading order computational cost of hyperparameter estimation is made comparable to the cost of a single model fit given hyperparameters. In many cases this represents an $O(n)$ computational saving when modelling $n$ data. This development makes if feasible, for the first time, to use leave-out-neighbourhood cross validation to deal with the wide spread problem of un-modelled short range autocorrelation which otherwise leads to underestimation of smoothing parameters. It is also shown how to accurately quantifying uncertainty in this case, despite the un-modelled autocorrelation. Practical examples are provided including smooth quantile regression, generalized additive models for location scale and shape, and focussing particularly on dealing with un-modelled autocorrelation.

arXiv.org

I need to estimate a delay distribution (i.e incubation period, reporting delay etc.) for an infectious disease what should I do?

This question is addressed in new work from #epinowcast community member @kcharniga: https://www.epinowcast.org/posts/2024-05-17-best-practices-delays/

Epinowcast - Best practices for estimating and reporting epidemiological delay distributions of infectious diseases using public health surveillance and healthcare data

Epinowcast community site

Epinowcast
Suggestions for best practice in scaling analysis

Hi everyone, Iโ€™m exploring whether there are good examples around of using Julia at scale that people can link? At the moment Iโ€™m helping develop a package for model-based epidemiological inference Rt-without-renewal/EpiAware at main ยท CDCgov/Rt-without-renewal ยท GitHub . Weโ€™re at the stage where we want to collect inference results across a number of different scenarios to answer some interesting questions about effective epi modelling. Looking at the space of handy workflow packages in juli...

Julia Programming Language

Following on from our last work (https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) Kelly Charniga has led a piece looking at best practices for estimating and reporting epidemiological delay distributions.

https://hal.science/hal-04572940v1

The aim here is to provide a checklist for both those producing and using epidemiological delay distributions.

Estimating epidemiological delay distributions for infectious diseases

Understanding and accurately estimating epidemiological delay distributions is important for public health policy. These estimates directly influence epidemic situational awareness, control strategies, and resource allocation. In this study, we explore challenges in estimating these distributions, including truncation, interval censoring, and dynamical biases. Despite their importance, these issues are frequently overlooked in the current literature, often resulting in biased conclusions. This study aims to shed light on these challenges, providing valuable insights for epidemiologists and infectious disease modellers. Our work motivates comprehensive approaches for accounting for these issues based on the underlying theoretical concepts. We also discuss simpler methods that are widely used, which do not fully account for known biases. We evaluate the statistical performance of these methods using simulated exponential growth and epidemic scenarios informed by data from the 2014-2016 Sierra Leone Ebola virus disease epidemic. Our findings highlight that using simpler methods can lead to biased estimates of vital epidemiological parameters. An approximate-latent-variable method emerges as the best overall performer, while an efficient, widely implemented interval-reduced-censoring-and-truncation method was only slightly worse. Other methods, such as a joint-primary-incidence-and-delay method and a dynamic-correction method, demonstrated good performance under certain conditions, although they have inherent limitations and may not be the best choice for more complex problems. Despite presenting a range of methods that performed well in the contexts we evaluated, residual biases persisted, predominantly due to the simplifying assumption that the distribution of event time within the censoring interval follows a uniform distribution; instead, this distribution should depend on epidemic dynamics. However, in realistic scenarios with daily censoring, these biases appeared minimal. This study underscores the need for caution when estimating epidemiological delay distributions in real-time, provides an overview of the theory that practitioners need to keep in mind when doing so with useful tools to avoid common methodological errors, and points towards areas for future research. What was known prior to this paper What this paper adds Key findings Key limitations ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement SF was supported by Wellcome Trust (210758/Z/18/Z). ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes All code used in the present study are available on <https://github.com/parksw3/epidist-paper> <https://github.com/parksw3/epidist-paper>

medRxiv