Iโ€™m at the University of Sheffield today for the Perspectives on Teaching Reproducibility symposium at the Teaching Reproducible Research and Open Science conference.

https://www.sheffield.ac.uk/smi/events/teaching-reproducible-research-and-open-science-conference

Iโ€™ll add links and notes about the day in this thread #OpenResearch #reproducibility #psyTeachR

Teaching Reproducible Research and Open Science Conference

Organisers: University of Sheffield (Sheffield Methods Institute, Open Research Working Group and University Library) and Project TIER.

The University of Sheffield

First up, Jennifer Buckley on โ€œOpportunities and challenges for teaching reproducibility in the context of UK Higher Education in the Social Sciences โ€“ insights from a consultation with teaching staffโ€

If you arenโ€™t familiar with the UK Data Service, itโ€™s a fantastic resource for managing social science data for research and teaching.

The survey involved 109 lecturers in social science most of whom teach quantitative methods and 16 follow-up interviews. Most agree that teaching reproducibility is important and that demonstrations and examples would be useful.

Most still use SPSS (seems to be more polisci than psych in the dataset)

Almost half of the lecturers surveyed prepare data to make it more usable for students. They often find there is no time to teach data preparation (one of the most important skills we emphasise in the #psyTeachR curriculum)

Next up, Jon Reades on Building Foundations: Pythonic (Geo)Data Science from the Ground Up

https://jreades.github.io/talks/reproducible/#/building-foundations-reproducible-geographic-data-science

Presentations โ€“ index

I love this list of benefits of reproducible workflows:

* Abstraction
* Employability
* Learning by seeing
* Learning by breaking
* Workload management

The Docker method of making sure all students have the same packages and resources looks fruitful. Iโ€™d be curious to see how easy it is to deploy docker on studentsโ€™ diverse machines.

Iโ€™m also impressed with teaching git/GitHub; I think version control is so important, but teaching it can be tricky and derail the class with esoteric problems.

And yay for more #QuartoPub resources!

Reades makes the excellent point that REF2028 has just been announced and itโ€™s clear that they want to promote wider thinking on research environment โ€” now called people and culture โ€” and the contribution of more diverse outputs, which should include open teaching materials.

https://www.ukri.org/news/early-decisions-made-for-ref-2028/

Early decisions made for REF 2028

The UKโ€™s higher education funding bodies have made initial decisions on the high-level design of the next Research Excellence Framework (REF).

Next up, Marina Bazhydai on "The good, the bad and the ugly: Teaching first year psychology undergraduates about research integrity and open science" (with Emma Mills, Richard Philpot, Mike Vernon, & @dermotlynott from Lancaster University)

The UG methods course focuses on broad questions of how to do science, in addition to the stats. They are supported by the PROSPR network https://www.lancaster.ac.uk/psychology/research/open-science/

Open Science

It's a really interesting idea to teach undergrads how to use tools like StatCheck and GRIM to detect research errors (or fraud) and the engage with the repliCATS project. Also, this demo is fab!

https://fivethirtyeight.com/features/science-isnt-broken/

Science Isnโ€™t Broken

If you follow the headlines, your confidence in science may have taken a hit lately. Peer review? More like self-review. An investigation in November uncovered a scam in which researchers were rubber-stamping their own work, circumventing peer review at five high-profile publishers.

FiveThirtyEight

The first keynote is by Norm Medeiros and Richard Ball from Project Tier โ€“ The New (Aspirational) Normal: Saturating Quantitative Methods Instruction with Reproducibility

https://www.projecttier.org/

This talk focusses on integrating computational reproducibility across all curricula as a precondition for other dimensions of research tansparency.

Project TIER | Project TIER | Teaching Integrity in Empirical Research

TIER Documentation Protocol provides instructions for assembling files documenting steps of data processing & analysis for a research paper.

**Documentation is the key to Reproducibility**

Essential elements:
- Original data
- Code

Additional elements:
- Output of computational results
- Additional information on data sources
- A read-me file

((Iโ€™d argue a README is essential!))

Very cool that the American Economic Association has a dedicated data editor and great online resources!

https://aeadataeditor.github.io/aea-de-guidance/

Step by step guidance

The following steps outline what you should expect after conditional acceptance of your manuscript, in compliance with the AEA Data and Code Availability Policy. Note that the AEAโ€™s Data and Code Availability Policy is compatible with the Data and Code Availability Standard v1.0. Prepare Prepare your data and code replication package (including data citations and provenance information). You can do this at any time, even before submitting to the AEA journals. Start Upload Provide metadata and upload the replication package. This step simultaneously prepares the materials for the verification process as well as for subsequent publication. Do it! Submit Submit the Data and Code Availability Form together with your manuscript native files as instructed, and as per guidelines at your journal (for example, AER guidelines). Only once these materials have been received by the editorial office are verification checks started. Ready to submit? What to expect next The next steps happen behind the scenes, until you receive the replication report: Next steps Approximate process flow Learn More

Office of the AEA Data Editor

I like the "reproducibity trifecta":

1. Fixed folder structure
2. Explicit management of the working directory
3. Use of relative directory paths in scripts

And the "key dimensions of reproducibility":

1. Soup-to-nuts reproducibility
2. (Almost) automated reproducibility
3. Portability

@debruine Nice. My version of the reproducibility trifecta is a bit harder to attainโ€ฆ

https://twitter.com/FrederikAust/status/1575079473401462784

[email protected] on Twitter

โ€œThe holy #rstats reproducibility trifecta: 1. A {targets} pipeline, 2. including a Quarto report of results, 3. wrapped in a Docker container. https://t.co/HEpy6XOxYlโ€

Twitter
@FrederikAust @debruine i tried getting into targets again just the past week and imho the cons far outweight any potential pros. GNU make ftw!
@matti @FrederikAust @debruine Would be interested in more detail (I have my own list of pros & cons, curious what yours are; "do you care about polyglot or non-R pipelines?" and "which system are you/people in your area already familiar with?" are the two most important Qs, IMO ...)

@bbolker @FrederikAust @debruine

Here's some quick thoughts in context of typical psychology data analysis projects:

Pros:
- promises easy scaling to clusters (though it looks complicated...)
- just R

Cons:
- will move the probability that collaborators understand my code from 25% to 0.0%
- further abstraction makes it more difficult for others to build on my code
- separates interactive development from "production" ready code
- just R

@bbolker @FrederikAust @debruine

Targets forces me to make everything a function, whereas Make allows anything as long as it can be run from CLI. Functions separate interactive analyses from "production" runs and make troubleshooting & understanding harder. Not in general, but for typical analyses where each function would only be used once I don't see the benefit.

@matti @FrederikAust @debruine

Largely agree. My 'pros':
- hash rather than timestamp-based (although using timestamps does make it easy to hack fresh status via 'touch')
- because it's R-based, can automatically construct dependency at a very granular level
- branching flows seem intrinsically tricky, but maybe? easier to handle in targets

Cons:
- more 'magic'/abstract
- R-only (not practically that important, 99% of my workflow is in R or has an R front end)

@bbolker @FrederikAust @debruine yup branching seems great in targets