Tim Morris

@Tim_P_Morris
316 Followers
105 Following
85 Posts
Principal research fellow at the MRC Clinical Trials Unit, UCL, working on statistical methods: which ones work well when? 
Missing data, simulation studies, estimands, covariate adjustment, meta-analysis. Stata user who carves spoons.
Google scholar profilehttps://scholar.google.com/citations?user=l6OPzY0AAAAJ&hl=en
UCL pagehttps://iris.ucl.ac.uk/iris/browse/profile?upi=TNMOR17

I am starting to see more and more cases where people have used tools like ChatGPT and Copilot to generate code, it doesn't work, and then they go to StackOverflow or elsewhere to ask why. I am not sure how I feel about this. On the one hand, using such tools can be useful, but at some point, one has to start making the effort to actually figure out how things work. Otherwise, even if the code works, you have no idea if it is doing what you want it to do.

#ChatGPT #Copilot #RStats

This is *exactly* how I feel about generalised linear models. #statistics

Just appeared in Biometrical Journal:

"Phases of methodological research inbiostatistics—Building the evidence base for new methods"

by @GeorgHeinze with M. Kammer @Tim_P_Morris and I. White

https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200222

@davepmiller I've never argued once there is zero value, current or future. My pushback is against obvious hype...are we not allowed to call it out because one day someone will do something to maybe warrant it? Lol, not enough hype about AI in medicine...get real.

Looking forward to giving a seminar to UCL at 1pm today - thanks for the invite @Tim_P_Morris
I will be speaking on:

"Stability of clinical prediction models developed using statistical or machine learning methods"

- based on the pre-print here https://arxiv.org/abs/2211.01061

Stability of clinical prediction models developed using statistical or machine learning methods

Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors. A developed model is a consequence of the development dataset and the chosen model building strategy, including the sample size, number of predictors and analysis method (e.g., regression or machine learning). Here, we raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Then, through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers should always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model building steps (those used in the development of the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and then deriving (i) a prediction instability plot of bootstrap model predictions (y-axis) versus original model predictions (x-axis), (ii) a calibration instability plot showing calibration curves for the bootstrap models in the original sample; and (iii) the instability index, which is the mean absolute difference between individuals' original and bootstrap model predictions. A case study is used to illustrate how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), whilst also informing a model's critical appraisal (risk of bias rating), fairness assessment and further validation requirements.

arXiv.org

A few related thoughts:

We need more evidence synthesis like this in the context of methodological research and, more generally, Phase III and IV studies; see
https://arxiv.org/abs/2209.13358 by @GeorgHeinze @Tim_P_Morris et al.

See also the works by @ppgardne in computational biology, e.g.:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6322486/

2/4

Phases of methodological research in biostatistics - building the evidence base for new methods

Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of clinical research in drug development, we define four phases of methodological research. These four phases cover (I) providing logical reasoning and proofs, (II) providing empirical evidence, first in a narrow target setting, then (III) in an extended range of settings and for various outcomes, accompanied by appropriate application examples, and (IV) investigations that establish a method as sufficiently well-understood to know when it is preferred over others and when it is not. We provide basic definitions of the four phases but acknowledge that more work is needed to facilitate unambiguous classification of studies into phases. Methodological developments that have undergone all four proposed phases are still rare, but we give two examples with references. Our concept rebalances the emphasis to studies in phase III and IV, i.e., carefully planned methods comparison studies and studies that explore the empirical properties of existing methods in a wider range of problems.

arXiv.org

@HeidiSeibold @statsepi

Mary Shelly's Frankenstein is pretty good, and has inspired lots of imitators, but I don't think the science at the heart of the story has ever been replicated.

Still an awesome tune, IMHO...

https://youtu.be/LLs-JP5FGAg

Skunk Anansie - Hedonism

YouTube

I'm not a real statistician, and you can be one too.

https://statsepi.substack.com/p/im-not-a-real-statistician-and-you

I’m not a real statistician, and you can be one too

Cara and I were hand-in-hand, slowly strolling around Cork on a bright, summer evening.

Life is pain, especially your data

There iS no greater indicTment of academia than hOw we've sheePishly normalIzed The abuse of acronyms.

STOP IT