Daniel Oberski

@daob
410 Followers
234 Following
22 Posts

Professor of Social & Health Data Science at Utrecht University & University Medical Center Utrecht, The Netherlands.

Statistics; Latent variables; Structural Equation Models; Methodology of data science; Applications of machine learning to research in social and health domains.

Group website @UUhttps://hds.sites.uu.nl
Publicationshttps://daob.nl/publications

Today in our Data Science reading group @utrechtuniversity we talked about this paper by Jessica Hullman et al.: https://arxiv.org/abs/2203.06498v6

It's really great & well-written; as @daob mentioned in the meeting, an incredible amount of work went into Table 1, which compares pitfalls of the scientific process in the social psychology and machine learning fields:

The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning

Recent arguments that machine learning (ML) is facing a reproducibility and replication crisis suggest that some published claims in ML research cannot be taken at face value. These concerns inspire analogies to the replication crisis affecting the social and medical sciences. They also inspire calls for greater integration of statistical approaches to causal inference and predictive modeling. A deeper understanding of what reproducibility concerns in research in supervised ML have in common with the replication crisis in experimental science can put the new concerns in perspective, and help researchers avoid "the worst of both worlds," where ML researchers begin borrowing methodologies from explanatory modeling without understanding their limitations and vice versa. We contribute a comparative analysis of concerns about inductive learning that arise in causal attribution as exemplified in psychology versus predictive modeling as exemplified in ML. We identify themes that re-occur in reform discussions, like overreliance on asymptotic theory and non-credible beliefs about real-world data generating processes. We argue that in both fields, claims from learning are implied to generalize outside the specific environment studied (e.g., the input dataset or subject sample, modeling implementation, etc.) but are often impossible to refute due to forms of underspecification. In particular, many errors being acknowledged in ML expose cracks in long-held beliefs that optimizing predictive accuracy using huge datasets absolves one from having to make assumptions about the underlying data generating process. We discuss risks and opportunities that arise as both fields attempt to resolve concerns about methods.

arXiv.org
Roses are red.
Roses are blue.
Depending on their velocity
relative to you.

Each day new students, researchers and employees within the Dutch educational and research community can join Mastodon with their existing institutional account!

Not only lowering the threshold to explorer Mastodon but also supporting “group” accounts besides providing personal accounts!

Join social.edu.nl ! See how to register an account or see if your institution is already connected!

https://surf.nl/mastodon-pilot

#mastodon #publicvalues #research #education #surfconext

Mastodon-pilot voor onderzoek en onderwijs

SURF en Universiteiten van Nederland verkennen samen Mastodon als open source platform voor het onderwijs en onderzoek in Nederland.

SURF.nl

Wikipedia heeft een prijs naar mij vernoemd 🙏

Erg eervol dat het grootste openbare kennisdelingsproject dit doet. Dat het dan nog een prijs is voor de beste samenwerking, is helemaal mooi.

https://www.wikimedia.nl/actueel/blog/wikiuil-vernoemd-naar-casper-albers/

WikiUil vernoemd naar Casper Albers - Wikimedia Nederland

Wie is Casper Albers waarnaar de ‘SamenwerkingsUil’ is vernoemd?

Wikimedia Nederland

Q. Why do mathematicians confuse Halloween and Christmas?

A. Because 31 Oct = 25 Dec.

Happy Christmas.

Calling all ECRs who would like to learn more social data science by doing!

The ODISSEI Social Data Science (SoDa) team offers SoDa traineeships for early career social scientists. Successful SoDa trainees will spend between 3-8 months full-time working on a social science research project they propose. During this time, they are members of the SoDa team at the Methodology & Statistics department of Utrecht University and mentored by one of the senior team members. 

https://odissei-data.nl/wp-content/uploads/2022/10/soda_traineeship_call-2022.docx-Google-Docs.pdf

We are here 👋

We joined this awesome space today 🚀 100 friends here feels like a 1000 elsewhere 😉

Oh, by the way: we are looking for new colleagues!! #jobs 17 (seventeen!) #phd #postdoc positions. Be our new colleague 🤩

Info, deadline etc:
https://www.algosocvacancies.org

Home | Algosoc Vacancies

Algosoc Vacancies

An interesting phenomenon in asking chatGPT to generate R, Julia, Lean, and Python code: it invents syntax and facilities that *should* exist.

For example, if I ask for a survey SEM analysis in R, it expects there to be an svysem function. This function does not exist (it's called lavaan.survey), but maybe it should!

So ChatGPT can illustrate how users of software might reasonably expect the software to work, potentially helping us design it better.

You might want to log in on the other side and vote ;-)

Check out this released talk: Integrating equation solvers with probabilistic programming through differentiable programming.

From the Computational Abstractions for Probabilistic and Differentiable Programming Workshop.

https://youtu.be/rEwBxCBl92k

Discussion of the pros and cons of the development styles of #julialang Turing.jl with #sciml, #rstats #stan, how that effects ODE solver support and documentation, etc.

Chris Rackauckas Integrating equation solvers with probabilistic programming through differentiabl

YouTube