#statstab #546 Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Thoughts: Think more about what "assumption checking" means.

#assumptions #tutorial #nhst #epistemology #statistics

https://link.springer.com/article/10.3758/s13428-023-02072-x

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics - Behavior Research Methods

Statistical methods generally have assumptions (e.g., normality in linear regression models). Violations of these assumptions can cause various issues, like statistical errors and biased estimates, whose impact can range from inconsequential to critical. Accordingly, it is important to check these assumptions, but this is often done in a flawed way. Here, I first present a prevalent but problematic approach to diagnosticsโ€”testing assumptions using null hypothesis significance tests (e.g., the Shapiroโ€“Wilk test of normality). Then, I consolidate and illustrate the issues with this approach, primarily using simulations. These issues include statistical errors (i.e., false positives, especially with large samples, and false negatives, especially with small samples), false binarity, limited descriptiveness, misinterpretation (e.g., of p-value as an effect size), and potential testing failure due to unmet test assumptions. Finally, I synthesize the implications of these issues for statistical diagnostics, and provide practical recommendations for improving such diagnostics. Key recommendations include maintaining awareness of the issues with assumption tests (while recognizing they can be useful), using appropriate combinations of diagnostic methods (including visualization and effect sizes) while recognizing their limitations, and distinguishing between testing and checking assumptions. Additional recommendations include judging assumption violations as a complex spectrum (rather than a simplistic binary), using programmatic tools that increase replicability and decrease researcher degrees of freedom, and sharing the material and rationale involved in the diagnostics.

SpringerLink

#statstab #542 Improving the utility of non-significant results for educational research

Thoughts: Another paper that tries to clarify some issue with NHST.

#nhst #stats #pvalues #education #frequentist #misconceptions

https://www.sciencedirect.com/science/article/pii/S1747938X23000830#tbl5

#statstab #521 Non-adjustment for multiple testing in multi-arm trials of distinct treatments: Rationale and justification

Thoughts: Most researchers struggle with multiplicity corrections.

#error #bonferroni #FWER #FDR #falsepositive #nhst #alpha

https://pmc.ncbi.nlm.nih.gov/articles/PMC7534018/

Checking your browser - reCAPTCHA

#statstab #518 Before p < 0.05 to Beyond p < 0.05: Using History to
Contextualize p-Values and Significance Testing

Thoughts: An appropriate paper for today.

#pvalues #teaching #history #pedagogy #nhst #fisher

https://www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1537891

Beyond the p Value: Reform Spreads Across the World and Across Disciplines

...as evidenced by this article from Brazil, which I'm delighted to see:

The article's header

I salute Karen Grimmer, JECP co-editor, for publishing it,

https://thenewstatistics.com/itns/2026/03/04/beyond-the-p-value-reform-spreads-across-the-world-and-across-disciplines/

#NHST #OpenScience #Replication #StatisticalReform

#statstab #489 On the performance of the Neyman Allocation with small pilots

Thoughts: If you know your treatment condition will have larger variance you can optimise your sample size.

#nhst #samplesize #neynan #heterogeneity #welch #variance #pilot #se

https://www.sciencedirect.com/science/article/pii/S0304407624001398

#statstab #487 More than meets the ITT: A guide for anticipating and investigating nonsignificant results in survey experiments

Thoughts: I see a lot of papers that make at least one of the 7 errors for "no effect".

#survey #nhst #nulleffects #nonsignificant #pvalue #power

https://doi.org/10.1017/XPS.2024.1

More than meets the ITT: A guide for anticipating and investigating nonsignificant results in survey experiments | Journal of Experimental Political Science | Cambridge Core

More than meets the ITT: A guide for anticipating and investigating nonsignificant results in survey experiments - Volume 12 Issue 1

Cambridge Core

#statstab #478 Equivalence Tests {marginaleffects}

Thoughts: Often you want to test "no difference" in more complex models than many packages or software permit.
With a few lines of code you can do that for most models.

#Equivalence #noeffect #rstats #TOST #EQ #NHST #hypothesistesting

https://marginaleffects.com/chapters/predictions.html#sec-predictions_visualization

5  Predictions โ€“ Model to Meaning

#statstab #476 Experimental : causal

Thoughts: Randomized experiments are the gold standard for inference for a reason. But they are hard to design.

#design #r #statistics #methods #experiment #tutorial #pedagogy #education #hypothesis #nhst #causal #ancova

https://book.declaredesign.org/library/experimental-causal.html

18  Experimental : causal โ€“ Research Design in the Social Sciences

#statstab #467 Replication, statistical consistency, and publication bias

Thoughts: Is the replication of a a finding the "gold standard" for scientific discovery? Maybe not.

#replication #metascience #metapsychology #statistics #bias #QRPs #nhst

https://doi.org/10.1016/j.jmp.2013.02.003