R.A. Fisher wrote that the purpose of statisticians was "constructing a hypothetical infinite population of which the actual data are regarded as constituting a random sample." ( p. 311 here ). In The Zeroth Problem Colin Mallows wrote "As Fisher pointed out, statisticians earn their living by using two basic tricks-they regard data as being realizations of random variables, and they assume that they know an appropriate specification for these random variables."

Some of the pathological beliefs we attribute to techbros were already present in this view of statistics that started forming over a century ago. Our writing is just data; the real, important object is the “hypothetical infinite population” reflected in a large language model, which at base is a random variable. Stable Diffusion, the image generator, is called that because it is based on latent diffusion models, which are a way of representing complicated distribution functions--the hypothetical infinite populations--of things like digital images. Your art is just data; it’s the latent diffusion model that’s the real deal. The entities that are able to identify the distribution functions (in this case tech companies) are the ones who should be rewarded, not the data generators (you and me).

So much of the dysfunction in today’s machine learning and AI points to how problematic it is to give statistical methods a privileged place that they don’t merit. We really ought to be calling out Fisher for his trickery and seeing it as such.

#AI #GenAI #GenerativeAI #LLM #StableDiffusion #statistics #StatisticalMethods #DiffusionModels #MachineLearning #ML
A #study from the #Oxford Internet Institute analysed 445 #AIbenchmarks, finding that many #oversell #AIperformance and lack scientific rigour. The study highlights issues like #uncleardefinitions, #datareuse, and inadequate #statisticalmethods, calling for more rigorous and transparent benchmark criteria. https://www.nbcnews.com/tech/tech-news/ai-chatgpt-test-smart-capabilities-may-exaggerated-flawed-study-rcna241969?eicker.news #tech #media #news
AI’s capabilities may be exaggerated by flawed tests, study says

Researchers behind a new study say that the methods used to evaluate AI systems’ capabilities routinely oversell AI performance and lack scientific rigor.

NBC News

Weekly Update from the Open Journal of Astrophysics – 13/09/2025

It’s Saturday again, so it’s time for another summary of the week’s new papers at the Open Journal of Astrophysics. Since the last update we have published seven new papers, which brings the number in Volume 8 (2025) up to 134, and the total so far published by OJAp up to 369. We seem to be emerging for the slight late-summer hiatus we have experienced over the last few weeks.

Anyway, the first paper to report this week is “Observing the Sun with the Atacama Large Aperture Submillimeter Telescope (AtLAST): Forecasting Full-disk Observations” by Mats Kirkaune & Sven Wedemeyer (U. Oslo, Norway), Joshiwa van Marrewijk (Leiden U., Netherlands), Tony Mroczkowski (ESO, Garching, Germany) and Thomas W. Morris (Yale, USA). This paper discusses possible strategies and parameters for full-disk observations of the Sun using the proposed Atacama Large Aperture Submillimeter Telescope (AtLAST). It was published on Tuesday 9th September 2025 in the folder Solar and Stellar Astrophysics.

The overlay is here:

 

You can make this larger by clicking on it.  The officially accepted version of this paper can be found on the arXiv here.

The second paper this week, published on Wednesday 10th September in the folder Cosmology and NonGalactic Astrophysics, is “The exact non-Gaussian weak lensing likelihood: A framework to calculate analytic likelihoods for correlation functions on masked Gaussian random fields” by Veronika Oehl and Tilman Tröster (ETH Zurich, Switzerland).  This paper shows how to calculate likelihoods for the correlation functions of spin-2 Gaussian random fields defined on the sphere in the presence of a mask with applications to weak gravitational lensing.

The overlay is here:

and you can find the final accepted version on arXiv here.

Next one up, the third paper this week, is  “Subspace Approximation to the Focused Transport Equation. II. The Modified Form” by B. Klippenstein and Andreas Shalchi (U. Manitoba, Canada). This was also published on 10th September 2025 in the folder Solar and Stellar Astrophysics. It is about solving the focused transport equation analytically and numerically using the subspace method in two or more dimensions.

You can find the final accepted version on arXiv here.

The fourth paper of this week was also published on Wednesday 10th September. It is “Mass models of galaxy clusters from a non-parametric weak-lensing reconstruction” by Tobias Mistele (Case Western Reserve U., USA), Federico Lelli (INAF, Firenze, Italy), Stacy McGaugh (Case Western), James Schombert (U. Oregon, USA) and Benoit Famaey (Université de Strasbourg, France).  Published in the folder Cosmology and NonGalactic Astrophysics, it presents new, non-parametric deprojection method for weak gravitational lensing applied to a sample of galaxy clusters. The overlay is here:

You can find the officially accepted version on arXiv here.

The fifth paper of the week is “A Swift Fix II: Physical Parameters of Type I Superluminous Supernovae” by Jason T. Hinkle & Benjamin J. Shappee (U. Hawaii, USA) and Michael A. Tucke (Ohio State, USA). This one was published on Thursday 11th September 2025 in the folder High-Energy Astrophysical Phenomena. The paper uses recalibrated Swift photometry to recompute peak luminosities and other properties of a sample of superluminous Type I supernovae. The overlay is here:

You can find the official accepted version on arXiv here.

Paper No. 6 for this week is “Detailed Microwave Continuum Spectra from Bright Protoplanetary Disks in Taurus” by Caleb Painter (Harvard, USA) and 11 others, too numerous to mention by name, based in the USA, Germany, Mexico and Taiwan.  This one was published in the folder marked Solar and Stellar Astrophysics on September 11th 2025. It presents new observations sampling the microwave (4-360 GHz) continuum spectra from eight young stellar systems in the Taurus region. The overlay is here:

 

The final version can be found on arXiv here.

The last paper for this update is “On Soft Clustering For Correlation Estimators” by Edward Berman (Northeastern University, USA) and 13 others based in the USA, France, Denmark and Finland and Cosmos-Web:The JWST Cosmic Origins Survey. This was published on Friday 12th September 2025 in the folder Instrumentation and Methods for Astrophysics. It presents an algorithm for estimating correlations that clusters objects in a probabilistic fashion, enabling the uncertainty caused by clustering to be quantified simply through model inference. The overlay is here:

You can find the final version on arXiv here.

And that’s all the papers for this week. I’ve noticed a significant recent increase in the number of papers in Solar and Stellar Astrophysics, which means we’re broadening our impact across the community. Which is nice.

P.S. I found out last week that, according to NASA/ADS, papers in OJAp have now accumulated over 5000 citations.

#arXiv230903270v3 #arXiv240708718v2 #arXiv250406174v3 #arXiv250513145v2 #arXiv250613716v2 #arXiv250711801v2 #arXiv250721268v2 #AtacamaLargeApertureSubmillimeterTelescope #AtLAST #CorrelationFunctions #CosmologyAndNonGalacticAstrophysics #DiamondOpenAccess #FocusedTransportEquation #galaxyClusters #InstrumentationAndMethodsForAstrophysics #MicrowaveSpectroscopy #OpenJournalOfAstrophysics #ProtoplanetaryDisk #protoplanetaryDisks #SolarAndStellarAstrophysics #solarObservations #Spin2Fields #StatisticalMethods #strongGravitationalLensing #SuperluminousSupernovae #SWIFT #TheOpenJournalOfAstrophysics #weakGravitationalLensing

I'm in the final stages of preparing for the upcoming online course on Statistical Methods in R, which starts next Monday, September 9.

Check out the upcoming course: https://statisticsglobe.com/online-course-statistical-methods-r

#rstats #statistics #statisticalmethods #datascience #dataviz

Online Course: Statistical Methods in R

The Ultimate Course to Quickly Master Statistical Methods in R - Instructor: Joachim Schork - Statistics Globe

Statistics Globe
#ChildhoodTrauma can lead to lifelong psychological challenges. Using #StatisticalMethods, Giusi Moffa aims to identify which symptoms lead to serious disorders during a person’s lifetime so that psychologists can offer help: https://dmi.unibas.ch/en/news-events/detail/statistik-fuer-die-seele/
Statistics for the soul.

People who experience bullying or sexual abuse during childhood often suffer from psychological problems later in life. Using statistical methods, Giusi Moffa aims to identify which symptoms lead to serious disorders during a person’s lifetime so that psychologists can offer help.

My son and his lab partner are conducting a survey on the "Relationship Between Violent Video Games and Aggression" for his Psychology 319: Research Designs and Intermediate Statistical Methods in Psychology course at University. If you could help him and his assignment partner out, that would be fantastic!

Boosts welcomed and appreciated!

#Psychology #VideoGames #Games #Gaming #Agression #Survey #Study #Research Designs #StatisticalMethods

https://docs.google.com/forms/d/e/1FAIpQLSc6wLI8k2LV8IcGj1fCe6fu9ZNoTbDfSgdJuHVnR2pBqDfcaw/viewform?pli=1

The Relationship Between Violent Video Games and Aggression

This form is for Jaden and Jordan's PSY 319 assignment. We're looking to study the relationship between violent video games and aggression. Please answer the questions truthfully (this form is anonymous). If you have any questions, please contact Bryan Rooney at [email protected]

Google Docs
New in European Science Editing: Nearly 1/2 of health sciences journals in South Africa don't mention statistics in their instructions for authors or make cursory references. Gina Joubert concludes that editors & publishers must give more detail on reporting requirements for statistical methods in quantitative research articles.
https://doi.org/10.3897/ese.2024.e114734
#EuropeanScienceEditing #EASEpublications #HealthStatistics #JournalPublication #ReportingGuidelines #StatisticalMethods #SouthAfrica #Statistics
Reporting and presentation of statistical analyses: instructions for authors of health sciences journals based in South Africa

Background: Statistical analyses are a key component of quantitative research in health sciences. Objectives: To review the instructions for authors on reporting and presentation of statistical methods by all health sciences journals based in South Africa. Methods: Health sciences journals based in South Africa that publish original quanti-tative research articles were identified using three sources, namely the list of accred-ited South African journals compiled by the South African Department of Higher Education and Training in 2022, relevant journals covered in Scopus, and web pages of major health sciences publishers in South Africa. The list was cross-checked against the listing of journals in Sabinet, an online database covering South Africa, under the category ‘Collection: Medicine and Health’. The instructions for authors given by the journals were accessed through their websites. The form for recording data was based on items listed in the ‘Statistical Analyses and Methods in the Published Literature’ (SAMPL) guidelines. Results: All except one of the 52 journals could be located online. Of the 51, 13 (25%) made no mention of statistics in their instructions, and 11 (22%) made only a gen-eral statement regarding statistical content with no further guidance. The statistical item most frequently mentioned was the P value (45% of journals), whereas the rest of the items appeared in the instructions of 20% or fewer journals. Nine journals (18%) referred to the EQUATOR guidelines, mainly CONSORT (10%). Conclusion: Nearly half of the health sciences journals based in South Africa either did not mention statistics at all in their instructions for authors or made only a cur-sory reference to statistics. The study thus emphasizes that these journals, in their instructions for authors, need to cover in greater detail the reporting and presenta-tion of statistical methods in articles reporting quantitative research.

European Science Editing

More on "UNCLASSIFIED": there are 36,520 of those sites right now. (Despite knowing better I keep diving in and classifying more of them.)

It's not practical to list all of them. But we can randomly sample. And large-sample statistics start to apply at about n=30, so let's just grab 30 of those sites at random using sort -R | head -30:

1 sfg.io
1 extroverteddeveloper.com
2 letmego.com
1 thestrad.com
2 bombmagazine.org
1 domlaut.com
1 bootstrap.io
1 jumpdriveair.com
2 desmos.com
1 leo32345.com
1 echopen.org
1 schd.ws
1 web3us.com
7 akkartik.name
1 bcardarella.com
1 cancerletter.com
1 platinumgames.com
1 industrytap.com
2 worldoftea.org
1 motion.ai
1 vectorly.io
2 enterprise.google.com
1 lift-heavy.com
1 davidpeter.me
1 panoye.com
3 thestrategybridge.org
2 fontsquirrel.com
1 kettunen.io
1 moogfoundation.org
2 elekslabs.com

That's a few foundations, a few blogs, a corporate site (enterprise.google.com), and something about tea, all with a small number of posts (1--7).

I'm looking at some slightly larger samples (60--100) here on my own system, and can actually make some comparisons across samples (to see how much variance there is) which can give some more information on tuning what I would expect to find under the "UNCLASSIFIED" sites.

Which is one way of using #StatisticalMethods to make estimates where direct measurement or assessment is impractical.

#HackerNewsAnalytics #HackerNews #MediaAnalysis #RandomSampling #Statistics

@jiejie

Let's not contribute to #Ai hysteria (pro or against).

Technology based on #statisticalMethods of #textAnalysis, that generates text (it actually makes predictions based on #statisticalProbability of what should come next in a sentence), has no conscience or feelings.

What will become of it in the future, I can't say for certain.

No todo lo que nos suene raro debe ser complicado.

Algunas veces descubrimos (o redescubrimos) técnicas que nos pueden ayudar al tomar decisiones.

El modelado de Markov es una de ellas. Elegante y fácil de implementar. Lo tiene todo.

En esta ocasión nos centramos en equipos o sistemas redundantes, así en general... pero particularizamos y daremos ejemplos.

https://www.pacienciadigital.com/modelo-de-markov/

#Estadística #Matemáticas #Probabilidad #AnálisisDeDatos #ModelosEstadísticos #InferenciaEstadística #EstadísticaAplicada #Cálculo #EstadísticaDescriptiva #MétodosEstadísticos #Statistics #Mathematics #Probability #DataAnalysis #StatisticalModels #StatisticalInference #AppliedStatistics #Calculus #DescriptiveStatistics #StatisticalMethods

Calculando la disponibilidad con un modelo de Markov. Fácil

Descubre cómo con un modelo de Markov puedes mejorar la disponibilidad de sistemas redundantes. Aprende a predecir fallos y optimizar el rendimiento en este fascinante post técnico.

Paciencia Digital. Domótica, Estadística y Datos