Concentration of measures:
Talagrand's "work illustrates the idea that the interplay of many random events can, counter-intuitively, lead to outcomes that are more predictable, and gives estimates for the extent to which the uncertainty is reigned in."

Marianne Freiberger: https://plus.maths.org/content/abel-prize-2024 @data @mathematics

#maths #mathematics #Talagrand #data #probability #magnets #spinGlasses #physics

The Abel Prize 2024: Michel Talagrand

The Abel Prize 2024 has been awarded to Michel Talagrand for ground breaking contributions to probability theory and functional analysis.

Plus Maths

"Majorizing measures provide bounds for the supremum of stochastic processes. They represent the most general possible form of the chaining argument".

Michel Talagrand, 1996, https://projecteuclid.org/journals/annals-of-probability/volume-24/issue-3/Majorizing-measures-the-generic-chaining/10.1214/aop/1065725175.full

#geometry #theorem #probability #maths #mathematics #Talagrand #data #bigData #chaining #ML #AbelPrize #Abel

Majorizing measures: the generic chaining

Majorizing measures provide bounds for the supremum of stochastic processes. They represent the most general possible form of the chaining argument going back to Kolmogorov. Majorizing measures arose from the theory of Gaussian processes, but they now have applications far beyond this setting. The fundamental question is the construction of these measures. This paper focuses on the tools that have been developed for this purpose and, in particular, the use of geometric ideas. Applications are given to several natural problems where entropy methods are powerless.

Project Euclid

In 2016, the American Statistical Association #ASA made a formal statement that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result".

It also stated that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone".

#nullHypothesis #probabilities #probability #maths #mathematics #vectors #data #bigData #matrices #ML #distributions #stats #statistics

@maugendre P-values are abused far and wide. This has reminded me that I should add "ranting about p-values" to the list of things I rant about to high school maths and physics textbook publishers, teachers, curriculum writers and exam setters.

@level98

😀
There even wikipedia on the "Misuse of p-values": https://en.wikipedia.org/wiki/Misuse_of_p-values

I therefore am adding to my guidelines: "Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. […]
Before we can improve our statistical inferences, we need to improve our statistical questions."

Excerpt from Daniël Lakens (2021) https://journals.sagepub.com/doi/10.1177/1745691620958012

#quotes #nullHypothesis #probability #math #pValues #maths #AIEthics #ML #statistics

Misuse of p-values - Wikipedia

"In #probability theory, a log-normal (or #lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution."

"It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics […], energies, concentrations, lengths, prices".

https://en.wikipedia.org/wiki/Log-normal_distribution

#statistics #finance #modeling

Log-normal distribution - Wikipedia

Surveys, coincidences, statistical significance 🧵

"What Educated Citizens Should Know About Statistics and Probability"
By Jessica Utts, in 2003: https://ics.uci.edu/~jutts/AmerStat2003.pdf via @hrefna

@edutooters

#nullHypothesis #probability #probabilities #pValues #statistics #stats #education #higherEd #statisticalLiteracy #bias #media #causalInference

Correlation-based Feature Selection in Python from Scratch – Johannes Schusterbauer

"In real life, we weigh the anticipated consequences of the decisions that we are about to make. That approach is much more rational than limiting the percentage of making the error of one kind in an artificial (null hypothesis) setting or using a measure of evidence for each model as the weight."
Longford (2005) http://www.stat.columbia.edu/~gelman/stuff_for_blog/longford.pdf

#modeling #nullHypothesis #probability #probabilities #pValues #statistics #stats #statisticalLiteracy #bias #inference #modelling #regression #linearRegression

Frontiers | Correlation Constraints for Regression Models: Controlling Bias in Brain Age Prediction

In neuroimaging, the difference between chronological age and predicted brain age, also known as brain age delta, has been proposed as a pathology marker lin...

Frontiers

@data @datadon 🧵

How to assess a statistical model?
How to choose between variables?

Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/

#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

Kendall Tau-b vs Spearman: Which Correlation Coefficient Wins?

Discover why Kendall Tau-b vs Spearman Correlation is crucial for your data analysis and which coefficient offers the most reliable results.

LEARN STATISTICS EASILY

@data @datadon 🧵

Accuracy! To counter regression dilution, a method is to add a constraint on the statistical modeling.
Regression Redress restrains bias by segregating the residual values.
My article: http://data.yt/kit/regression-redress.html

#bias #modeling #dataDev #AIDev #modelEvaluation #regression #modelling #dataLearning #linearRegression #probability #probabilities #statistics #stats #correctionRatio #ML #distributions #accuracy #RegressionRedress #Python #RStats

Logistic regression may be used for classification.

In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

More: https://www.baeldung.com/cs/gradient-descent-logistic-regression

#optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss