Frank Harrell

1.4K Followers
77 Following
168 Posts
Professor of Biostatistics, Vanderbilt University School of Medicine
Expert Biostatistics Advisor, FDA Center for Drug Evaluation and Research
Member, R Foundation
Associate Editor, Statistics in Medicine
Bloghttps://fharrell.com
Web Sitehttps://hbiostat.org
Discussion Boardhttp://datamethods.org
The #rstats Hmisc package has another major update. One of the biggest changes is new output options for describe() including interactive sparklines for spike histograms. http://hbiostat.org/R/Hmisc
@VUMCbiostat @datavisFriendly @datascience
Hmisc

R Workflow e-book is starting to take advantage of new Quarto code annotation capability: https://hbiostat.org/rflow/long.html #rstat #stats #statistics
R Workflow - 13  Manipulation of Longitudinal Data

Regression Modeling Strategies 4-day virtual course May 16-19 taught by myself and Drew Levy is now open for registration at https://hbiostat.org/doc/rms/4day.html #rmscourse @VUMCbiostat #StatThink #statistics #biostatistics
RMS

Major new version of R rms package is now on CRAN with many improvements, the most important of them being the generalization of spline functions to work in any regression model, rms or otherwise. https://hbiostat.org/R/rms #rstats @VUMCbiostat
RMS

#statistics thought of the day: The Wilcoxon statistic when scaled to [0,1] (i.e., concordance probability) is at its null value of 0.5 (complete overlap of samples in 2 groups) if and only if the maximum likelihood estimate of the group effect is zero in the proportional odds model: https://www.fharrell.com/post/powilcoxon #biostatistics @VUMCbiostat #rct
Statistical Thinking - Equivalence of Wilcoxon Statistic and Proportional Odds Model

In this article I provide much more extensive simulations showing the near perfect agreement between the odds ratio (OR) from a proportional odds (PO) model, and the Wilcoxon two-sample test statistic. The agreement is studied by degree of violation of the PO assumption and by the sample size. A refinement in the conversion formula between the OR and the Wilcoxon statistic scaled to 0-1 (corcordance probability) is provided.

Significant new material added to R Workflow e-book #rstats @VUMCbiostat https://hbiostat.org/rflow
R Workflow

Excel remains a disastrous choice for data management: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008984 and it even kills people
Gene name errors: Lessons not learned

Author summary Autocorrection is a feature of modern softwares including messaging apps, word processors and spreadsheets. These are designed to avoid data entry errors but “autocorrect fails” can lead to information being distorted in undesired and sometimes humorous ways. What is not funny though is having genomics spreadsheets suffer from auto-conversion of gene names like SEPT8, DEC1 and MARCH3 into dates, a problem first characterised in 2004. A 2016 article on this topic led the Human Gene Name Consortium to change many of these gene names to be less susceptible to autocorrect. Despite this, our work here shows that gene name autocorrect errors continue to accumulate in supplementary genomics spreadsheet files at a rapid pace. To avoid this and other reproducibility problems with spreadsheets, big changes are required in the way genomics scientists analyse and share data. We provide several practical steps researchers can take to avoid gene name errors and reiterate that big genomics data analysis is better suited to Python/R notebooks rather than spreadsheets.

New issue of Statistical Thinking News is now out with pointers to several interesting articles: https://paper.li/stn #StatThink #statistics #biostatistics #clinical #sci
On improving the efficiency of trials via linear adjustment for a prognostic score

I’ve recently had the opportunity to spend a little time looking at an interesting approach for improving the efficiency of estimated treatment effects in clinical trials which exploits histo…

paper.li