Excellent article on the dangers of dichotomisation of continuous variables

“Cake causes herpes?” - promiscuous dichotomisation induces false positives
https://link.springer.com/article/10.1186/s12874-025-02712-0

#dataanalys #statistics #stats

“Cake causes herpes?” - promiscuous dichotomisation induces false positives - BMC Medical Research Methodology

Background Continuous biomedical data is often dichotomized into two or more groups for analysis, despite long-standing warnings from statisticians that this constitutes bad practice. This dichotomisation is typically discouraged because it reduces statistical power and may obscure important trends. This paper considers another reason to discourage this practice: that dichotomisation is a powerful tool to manipulate data, as dichotomising at an arbitrary yet flexible threshold (which we term ’promiscuous dichotomisation’) represents a powerful researcher degree of freedom. Methods The motivating question is how probable is it that given a set of uniformly distributed data a threshold can be engineered to produce the illusion of a true effect when none exists? To estimate this, we employed both analytical approaches and Monte-Carlo simulation approaches to quantify the expected number of spurious findings that could arise from manipulating a dichotomous threshold for an arbitrary data set. We also illustrate an example of this with NHANES data, showing how a spurious relationship between blood glucose and herpes status could be engineered. Results For even a relatively small sample of $$n=100$$ , a false positive rate of $$\approx 38\%$$ can be observed, rising to over $$66\%$$ if low counts scenarios are not excluded. With larger samples even with low-count exclusion, false positive rates in excess of $$66\%$$ for $$n=1000$$ and $$83\%$$ for $$n=10,000$$ are possible, climbing to in excess of $$81\%$$ and $$89\%$$ respectively if low-count scenarios were not excluded. For most configurations, manipulation of thresholds was a highly viable methods of crafting a false positive result. Conclusions It is likely that manipulating cut-off points in measured variables represents a significant source of data manipulation in published science, and the ease of access of larger health databases means this is an issue that is likely to grow in severity. We discuss implications of this, and means of identifying potential promiscuous dichotomisation.

SpringerLink
Cómo Crear Una Web con Bulma

En este tutorial aprenderás a Cómo Crear Una Web con Bulma, lo haremos paso a paso y siguiendo las prácticas correctas.

Blog de Programación y Desarrollo - Nube Colectiva
Implementando Media Queries con CSS 3

En la versión anterior a CSS 3 es decir la versión CSS 2, fue añadida la regla @media la cual permite especificar diferentes reglas de estilo para

Blog de Programación y Desarrollo - Nube Colectiva

How to Become a Data Analyst, With or Without a Degree? Career Guide, Skills, Salary, and Job Profile.

See here - https://techchilli.com/how-to/how-to-become-a-data-analyst/

#dataanalys #dataanalystjobs

How to Become a Data Analyst, With or Without a Degree? Career Guide, Skills, Salary, and Job Profile

How to Become a Data Analyst, With or Without a Degree? Career Guide, Skills, Salary, and Job Profile. Check here all about data analysts.

Tech Chill
Lyckan när man får tag på ett riktigt bra dataset. 
#datajournalistik #ddj #dataanalys
Announcing Python in Excel: Next-Level Data Analysis for All | Anaconda

Now you can write Python code directly in Microsoft Excel’s grid—no Python installation required.

Anaconda