Mastodawn

Frank Harrell May 9, 2023

The #rstats Hmisc package has another major update. One of the biggest changes is new output options for describe() including interactive sparklines for spike histograms. http://hbiostat.org/R/Hmisc
@VUMCbiostat @datavisFriendly @datascience

Hmisc

Frank Harrell Mar 29, 2023

R Workflow e-book is starting to take advantage of new Quarto code annotation capability: https://hbiostat.org/rflow/long.html #rstat #stats #statistics

R Workflow - 13 Manipulation of Longitudinal Data

Frank Harrell Mar 1, 2023

Regression Modeling Strategies 4-day virtual course May 16-19 taught by myself and Drew Levy is now open for registration at https://hbiostat.org/doc/rms/4day.html #rmscourse @VUMCbiostat #StatThink #statistics #biostatistics

RMS

Frank Harrell Jan 18, 2023

Major new version of R rms package is now on CRAN with many improvements, the most important of them being the generalization of spline functions to work in any regression model, rms or otherwise. https://hbiostat.org/R/rms #rstats @VUMCbiostat

RMS

Frank Harrell Dec 20, 2022

#statistics thought of the day: The Wilcoxon statistic when scaled to [0,1] (i.e., concordance probability) is at its null value of 0.5 (complete overlap of samples in 2 groups) if and only if the maximum likelihood estimate of the group effect is zero in the proportional odds model: https://www.fharrell.com/post/powilcoxon #biostatistics @VUMCbiostat #rct

Statistical Thinking - Equivalence of Wilcoxon Statistic and Proportional Odds Model

In this article I provide much more extensive simulations showing the near perfect agreement between the odds ratio (OR) from a proportional odds (PO) model, and the Wilcoxon two-sample test statistic. The agreement is studied by degree of violation of the PO assumption and by the sample size. A refinement in the conversion formula between the OR and the Wilcoxon statistic scaled to 0-1 (corcordance probability) is provided.

Frank Harrell Dec 4, 2022

Significant new material added to R Workflow e-book #rstats @VUMCbiostat https://hbiostat.org/rflow

R Workflow

Frank Harrell Nov 23, 2022

Excel remains a disastrous choice for data management: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008984 and it even kills people

Gene name errors: Lessons not learned

Author summary Autocorrection is a feature of modern softwares including messaging apps, word processors and spreadsheets. These are designed to avoid data entry errors but “autocorrect fails” can lead to information being distorted in undesired and sometimes humorous ways. What is not funny though is having genomics spreadsheets suffer from auto-conversion of gene names like SEPT8, DEC1 and MARCH3 into dates, a problem first characterised in 2004. A 2016 article on this topic led the Human Gene Name Consortium to change many of these gene names to be less susceptible to autocorrect. Despite this, our work here shows that gene name autocorrect errors continue to accumulate in supplementary genomics spreadsheet files at a rapid pace. To avoid this and other reproducibility problems with spreadsheets, big changes are required in the way genomics scientists analyse and share data. We provide several practical steps researchers can take to avoid gene name errors and reiterate that big genomics data analysis is better suited to Python/R notebooks rather than spreadsheets.

Frank Harrell Nov 19, 2022

New issue of Statistical Thinking News is now out with pointers to several interesting articles: https://paper.li/stn #StatThink #statistics #biostatistics #clinical #sci

On improving the efficiency of trials via linear adjustment for a prognostic score

I’ve recently had the opportunity to spend a little time looking at an interesting approach for improving the efficiency of estimated treatment effects in clinical trials which exploits histo…

paper.li

Blog	https://fharrell.com
Web Site	https://hbiostat.org
Discussion Board	http://datamethods.org