New preprint with the alternative title "What I wish I had known about delay distribution estimation in January 2020"

https://samabbott.co.uk/posts/2024-01-15-estimating-epidemiological-delay-distributions-for-infectious-diseases/

Featuring Poppy the puppy as a guest contributor!

Great collab w/ @sangwoopark, Andrei Akhmetzhanov, Kelly Charniga, Anne Cori, Nick Davies, @jd_mathbio, @sbfnk, Katie Gostic, Brian Grenfell, @nlinton_epi, @mlipsitch, Adrian Lison, Chris Overton, and Thomas Ward

Sam Abbott: Estimating epidemiological delay distributions for infectious diseases

I summarise our research on refining delay distribution estimates in epidemic modeling, a journey prompted by my general confusion. We explore and compare various approaches, drawing insights from simulations and Ebola virus disease outbreak data, and suggest paths for future improvements. Also featuring Poppy, the puppy, who is currently gnawing on my hand and occasionally archiving my emails.

Sam Abbott
Estimating epidemiological delay distributions for infectious diseases

Understanding and accurately estimating epidemiological delay distributions is important for public health policy. These estimates directly influence epidemic situational awareness, control strategies, and resource allocation. In this study, we explore challenges in estimating these distributions, including truncation, interval censoring, and dynamical biases. Despite their importance, these issues are frequently overlooked in the current literature, often resulting in biased conclusions. This study aims to shed light on these challenges, providing valuable insights for epidemiologists and infectious disease modellers. Our work motivates comprehensive approaches for accounting for these issues based on the underlying theoretical concepts. We also discuss simpler methods that are widely used, which do not fully account for known biases. We evaluate the statistical performance of these methods using simulated exponential growth and epidemic scenarios informed by data from the 2014-2016 Sierra Leone Ebola virus disease epidemic. Our findings highlight that using simpler methods can lead to biased estimates of vital epidemiological parameters. An approximate-latent-variable method emerges as the best overall performer, while an efficient, widely implemented interval-reduced-censoring-and-truncation method was only slightly worse. Other methods, such as a joint-primary-incidence-and-delay method and a dynamic-correction method, demonstrated good performance under certain conditions, although they have inherent limitations and may not be the best choice for more complex problems. Despite presenting a range of methods that performed well in the contexts we evaluated, residual biases persisted, predominantly due to the simplifying assumption that the distribution of event time within the censoring interval follows a uniform distribution; instead, this distribution should depend on epidemic dynamics. However, in realistic scenarios with daily censoring, these biases appeared minimal. This study underscores the need for caution when estimating epidemiological delay distributions in real-time, provides an overview of the theory that practitioners need to keep in mind when doing so with useful tools to avoid common methodological errors, and points towards areas for future research. What was known prior to this paper What this paper adds Key findings Key limitations ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement SF was supported by Wellcome Trust (210758/Z/18/Z). ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes All code used in the present study are available on <https://github.com/parksw3/epidist-paper> <https://github.com/parksw3/epidist-paper>

medRxiv

@sangwoopark @jd_mathbio @sbfnk @nlinton_epi @mlipsitch

(if anyone has thoughts on a nice non-profit open-access journal that would take something like this please reach out!)

@sangwoopark @jd_mathbio @sbfnk @nlinton_epi @mlipsitch

Also for trendy #rstats people this work was all an extension of {brms} which made our lives slightly harder but means that all models can have arbitrary strata and time-varying components - how neat!

@seabbs I wish I could recommend such a journal. Let me know if you find a good one! 🤞
@rob_models I think we are expecting not to find one due to length and so for this to be an eternal preprint sadly
@seabbs Yeah, I've read a number of great preprints that never ended up in a journal. It's disappointing, given how academic research and researchers are typically measured. There *should* be journals that would want to publish this. And length shouldn't matter, it's not like they're printing and shipping physical copies anymore.