Daniel Oberski

@daob
410 Followers
234 Following
22 Posts

Professor of Social & Health Data Science at Utrecht University & University Medical Center Utrecht, The Netherlands.

Statistics; Latent variables; Structural Equation Models; Methodology of data science; Applications of machine learning to research in social and health domains.

Group website @UUhttps://hds.sites.uu.nl
Publicationshttps://daob.nl/publications

Today in our Data Science reading group @utrechtuniversity we talked about this paper by Jessica Hullman et al.: https://arxiv.org/abs/2203.06498v6

It's really great & well-written; as @daob mentioned in the meeting, an incredible amount of work went into Table 1, which compares pitfalls of the scientific process in the social psychology and machine learning fields:

The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning

Recent arguments that machine learning (ML) is facing a reproducibility and replication crisis suggest that some published claims in ML research cannot be taken at face value. These concerns inspire analogies to the replication crisis affecting the social and medical sciences. They also inspire calls for greater integration of statistical approaches to causal inference and predictive modeling. A deeper understanding of what reproducibility concerns in research in supervised ML have in common with the replication crisis in experimental science can put the new concerns in perspective, and help researchers avoid "the worst of both worlds," where ML researchers begin borrowing methodologies from explanatory modeling without understanding their limitations and vice versa. We contribute a comparative analysis of concerns about inductive learning that arise in causal attribution as exemplified in psychology versus predictive modeling as exemplified in ML. We identify themes that re-occur in reform discussions, like overreliance on asymptotic theory and non-credible beliefs about real-world data generating processes. We argue that in both fields, claims from learning are implied to generalize outside the specific environment studied (e.g., the input dataset or subject sample, modeling implementation, etc.) but are often impossible to refute due to forms of underspecification. In particular, many errors being acknowledged in ML expose cracks in long-held beliefs that optimizing predictive accuracy using huge datasets absolves one from having to make assumptions about the underlying data generating process. We discuss risks and opportunities that arise as both fields attempt to resolve concerns about methods.

arXiv.org
Roses are red.
Roses are blue.
Depending on their velocity
relative to you.
@wviechtb I would say it clearly is, but the smallest randomization p-value is 0.5, which is not very helpful..

Each day new students, researchers and employees within the Dutch educational and research community can join Mastodon with their existing institutional account!

Not only lowering the threshold to explorer Mastodon but also supporting “group” accounts besides providing personal accounts!

Join social.edu.nl ! See how to register an account or see if your institution is already connected!

https://surf.nl/mastodon-pilot

#mastodon #publicvalues #research #education #surfconext

Mastodon-pilot voor onderzoek en onderwijs

SURF en Universiteiten van Nederland verkennen samen Mastodon als open source platform voor het onderwijs en onderzoek in Nederland.

SURF.nl

Wikipedia heeft een prijs naar mij vernoemd 🙏

Erg eervol dat het grootste openbare kennisdelingsproject dit doet. Dat het dan nog een prijs is voor de beste samenwerking, is helemaal mooi.

https://www.wikimedia.nl/actueel/blog/wikiuil-vernoemd-naar-casper-albers/

WikiUil vernoemd naar Casper Albers - Wikimedia Nederland

Wie is Casper Albers waarnaar de ‘SamenwerkingsUil’ is vernoemd?

Wikimedia Nederland

Q. Why do mathematicians confuse Halloween and Christmas?

A. Because 31 Oct = 25 Dec.

Happy Christmas.

Calling all ECRs who would like to learn more social data science by doing!

The ODISSEI Social Data Science (SoDa) team offers SoDa traineeships for early career social scientists. Successful SoDa trainees will spend between 3-8 months full-time working on a social science research project they propose. During this time, they are members of the SoDa team at the Methodology & Statistics department of Utrecht University and mentored by one of the senior team members. 

https://odissei-data.nl/wp-content/uploads/2022/10/soda_traineeship_call-2022.docx-Google-Docs.pdf

@[email protected] Codex, the system behind copilot, does have an API, but it does not know about R (yet). Then there are the familiar ethical issues with using people's code without asking..

Probably won't be part of Rstudio in the immediate future, but it could happen later.

@[email protected]

You can already use copilot for R in vscode!

If you want it in Rstudio, you might like to bump this issue: https://github.com/rstudio/rstudio/issues/10148

Github Copilot integration with RStudio · Issue #10148 · rstudio/rstudio

Hi! Are there any plans to make Github Copilot available in RStudio? RStudio is definitely a great development environment. It's just a pity that Copilot is not available. I've been using Copilot w...

GitHub