Mastodawn

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

I’m at the University of Sheffield today for the Perspectives on Teaching Reproducibility symposium at the Teaching Reproducible Research and Open Science conference.

https://www.sheffield.ac.uk/smi/events/teaching-reproducible-research-and-open-science-conference

I’ll add links and notes about the day in this thread #OpenResearch #reproducibility #psyTeachR

Teaching Reproducible Research and Open Science Conference

Organisers: University of Sheffield (Sheffield Methods Institute, Open Research Working Group and University Library) and Project TIER.

The University of Sheffield

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

First up, Jennifer Buckley on “Opportunities and challenges for teaching reproducibility in the context of UK Higher Education in the Social Sciences – insights from a consultation with teaching staff”

If you aren’t familiar with the UK Data Service, it’s a fantastic resource for managing social science data for research and teaching.

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

The survey involved 109 lecturers in social science most of whom teach quantitative methods and 16 follow-up interviews. Most agree that teaching reproducibility is important and that demonstrations and examples would be useful.

Most still use SPSS (seems to be more polisci than psych in the dataset)

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Almost half of the lecturers surveyed prepare data to make it more usable for students. They often find there is no time to teach data preparation (one of the most important skills we emphasise in the #psyTeachR curriculum)

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Link to the UK Data Service report:

https://ukdataservice.ac.uk/app/uploads/heconsultationreport2021-11-26.pdf

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Next up, Jon Reades on Building Foundations: Pythonic (Geo)Data Science from the Ground Up

https://jreades.github.io/talks/reproducible/#/building-foundations-reproducible-geographic-data-science

Presentations – index

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

I love this list of benefits of reproducible workflows:

* Abstraction
* Employability
* Learning by seeing
* Learning by breaking
* Workload management

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

The Docker method of making sure all students have the same packages and resources looks fruitful. I’d be curious to see how easy it is to deploy docker on students’ diverse machines.

I’m also impressed with teaching git/GitHub; I think version control is so important, but teaching it can be tricky and derail the class with esoteric problems.

And yay for more #QuartoPub resources!

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Reades makes the excellent point that REF2028 has just been announced and it’s clear that they want to promote wider thinking on research environment — now called people and culture — and the contribution of more diverse outputs, which should include open teaching materials.

https://www.ukri.org/news/early-decisions-made-for-ref-2028/

Early decisions made for REF 2028

The UK’s higher education funding bodies have made initial decisions on the high-level design of the next Research Excellence Framework (REF).

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Next up, Marina Bazhydai on "The good, the bad and the ugly: Teaching first year psychology undergraduates about research integrity and open science" (with Emma Mills, Richard Philpot, Mike Vernon, & @dermotlynott from Lancaster University)

The UG methods course focuses on broad questions of how to do science, in addition to the stats. They are supported by the PROSPR network https://www.lancaster.ac.uk/psychology/research/open-science/

Open Science

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

It's a really interesting idea to teach undergrads how to use tools like StatCheck and GRIM to detect research errors (or fraud) and the engage with the repliCATS project. Also, this demo is fab!

https://fivethirtyeight.com/features/science-isnt-broken/

Science Isn’t Broken

If you follow the headlines, your confidence in science may have taken a hit lately. Peer review? More like self-review. An investigation in November uncovered a scam in which researchers were rubber-stamping their own work, circumventing peer review at five high-profile publishers.

FiveThirtyEight

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

The first keynote is by Norm Medeiros and Richard Ball from Project Tier – The New (Aspirational) Normal: Saturating Quantitative Methods Instruction with Reproducibility

https://www.projecttier.org/

This talk focusses on integrating computational reproducibility across all curricula as a precondition for other dimensions of research tansparency.

Project TIER | Project TIER | Teaching Integrity in Empirical Research

TIER Documentation Protocol provides instructions for assembling files documenting steps of data processing & analysis for a research paper.

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

**Documentation is the key to Reproducibility**

Essential elements:
- Original data
- Code

Additional elements:
- Output of computational results
- Additional information on data sources
- A read-me file

((I’d argue a README is essential!))

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

Very cool that the American Economic Association has a dedicated data editor and great online resources!

https://aeadataeditor.github.io/aea-de-guidance/

Step by step guidance

The following steps outline what you should expect after conditional acceptance of your manuscript, in compliance with the AEA Data and Code Availability Policy. Note that the AEA’s Data and Code Availability Policy is compatible with the Data and Code Availability Standard v1.0. Prepare Prepare your data and code replication package (including data citations and provenance information). You can do this at any time, even before submitting to the AEA journals. Start Upload Provide metadata and upload the replication package. This step simultaneously prepares the materials for the verification process as well as for subsequent publication. Do it! Submit Submit the Data and Code Availability Form together with your manuscript native files as instructed, and as per guidelines at your journal (for example, AER guidelines). Only once these materials have been received by the editorial office are verification checks started. Ready to submit? What to expect next The next steps happen behind the scenes, until you receive the replication report: Next steps Approximate process flow Learn More

Office of the AEA Data Editor

Show thread

Lisa DeBruine 🏳️‍🌈Jun 20, 2023

I like the "reproducibity trifecta":

1. Fixed folder structure
2. Explicit management of the working directory
3. Use of relative directory paths in scripts

And the "key dimensions of reproducibility":

1. Soup-to-nuts reproducibility
2. (Almost) automated reproducibility
3. Portability

Show thread

Frederik Aust Jun 20, 2023

@debruine Nice. My version of the reproducibility trifecta is a bit harder to attain…

https://twitter.com/FrederikAust/status/1575079473401462784

[email protected] on Twitter

“The holy #rstats reproducibility trifecta: 1. A {targets} pipeline, 2. including a Quarto report of results, 3. wrapped in a Docker container. https://t.co/HEpy6XOxYl”

Twitter

Show thread

Matti Vuorre 🖖Jun 20, 2023

@FrederikAust @debruine i tried getting into targets again just the past week and imho the cons far outweight any potential pros. GNU make ftw!

Show thread

Ben Bolker Jun 20, 2023

@matti @FrederikAust @debruine Would be interested in more detail (I have my own list of pros & cons, curious what yours are; "do you care about polyglot or non-R pipelines?" and "which system are you/people in your area already familiar with?" are the two most important Qs, IMO ...)

Show thread

Matti Vuorre 🖖Jun 21, 2023

@bbolker @FrederikAust @debruine

Here's some quick thoughts in context of typical psychology data analysis projects:

Pros:
- promises easy scaling to clusters (though it looks complicated...)
- just R

Cons:
- will move the probability that collaborators understand my code from 25% to 0.0%
- further abstraction makes it more difficult for others to build on my code
- separates interactive development from "production" ready code
- just R

Show thread

Matti Vuorre 🖖Jun 21, 2023

@bbolker @FrederikAust @debruine

Targets forces me to make everything a function, whereas Make allows anything as long as it can be run from CLI. Functions separate interactive analyses from "production" runs and make troubleshooting & understanding harder. Not in general, but for typical analyses where each function would only be used once I don't see the benefit.

Show thread

Ben Bolker Jun 21, 2023

@matti @FrederikAust @debruine

Largely agree. My 'pros':
- hash rather than timestamp-based (although using timestamps does make it easy to hack fresh status via 'touch')
- because it's R-based, can automatically construct dependency at a very granular level
- branching flows seem intrinsically tricky, but maybe? easier to handle in targets

Cons:
- more 'magic'/abstract
- R-only (not practically that important, 99% of my workflow is in R or has an R front end)