I’m at the University of Sheffield today for the Perspectives on Teaching Reproducibility symposium at the Teaching Reproducible Research and Open Science conference.

https://www.sheffield.ac.uk/smi/events/teaching-reproducible-research-and-open-science-conference

I’ll add links and notes about the day in this thread #OpenResearch #reproducibility #psyTeachR

Teaching Reproducible Research and Open Science Conference

Organisers: University of Sheffield (Sheffield Methods Institute, Open Research Working Group and University Library) and Project TIER.

The University of Sheffield

First up, Jennifer Buckley on β€œOpportunities and challenges for teaching reproducibility in the context of UK Higher Education in the Social Sciences – insights from a consultation with teaching staff”

If you aren’t familiar with the UK Data Service, it’s a fantastic resource for managing social science data for research and teaching.

The survey involved 109 lecturers in social science most of whom teach quantitative methods and 16 follow-up interviews. Most agree that teaching reproducibility is important and that demonstrations and examples would be useful.

Most still use SPSS (seems to be more polisci than psych in the dataset)

Almost half of the lecturers surveyed prepare data to make it more usable for students. They often find there is no time to teach data preparation (one of the most important skills we emphasise in the #psyTeachR curriculum)

Next up, Jon Reades on Building Foundations: Pythonic (Geo)Data Science from the Ground Up

https://jreades.github.io/talks/reproducible/#/building-foundations-reproducible-geographic-data-science

Presentations – index

I love this list of benefits of reproducible workflows:

* Abstraction
* Employability
* Learning by seeing
* Learning by breaking
* Workload management

The Docker method of making sure all students have the same packages and resources looks fruitful. I’d be curious to see how easy it is to deploy docker on students’ diverse machines.

I’m also impressed with teaching git/GitHub; I think version control is so important, but teaching it can be tricky and derail the class with esoteric problems.

And yay for more #QuartoPub resources!

Reades makes the excellent point that REF2028 has just been announced and it’s clear that they want to promote wider thinking on research environment β€” now called people and culture β€” and the contribution of more diverse outputs, which should include open teaching materials.

https://www.ukri.org/news/early-decisions-made-for-ref-2028/

Early decisions made for REF 2028

The UK’s higher education funding bodies have made initial decisions on the high-level design of the next Research Excellence Framework (REF).

Next up, Marina Bazhydai on "The good, the bad and the ugly: Teaching first year psychology undergraduates about research integrity and open science" (with Emma Mills, Richard Philpot, Mike Vernon, & @dermotlynott from Lancaster University)

The UG methods course focuses on broad questions of how to do science, in addition to the stats. They are supported by the PROSPR network https://www.lancaster.ac.uk/psychology/research/open-science/

Open Science

It's a really interesting idea to teach undergrads how to use tools like StatCheck and GRIM to detect research errors (or fraud) and the engage with the repliCATS project. Also, this demo is fab!

https://fivethirtyeight.com/features/science-isnt-broken/

Science Isn’t Broken

If you follow the headlines, your confidence in science may have taken a hit lately. Peer review? More like self-review. An investigation in November uncovered a scam in which researchers were rubber-stamping their own work, circumventing peer review at five high-profile publishers.

FiveThirtyEight

The first keynote is by Norm Medeiros and Richard Ball from Project Tier – The New (Aspirational) Normal: Saturating Quantitative Methods Instruction with Reproducibility

https://www.projecttier.org/

This talk focusses on integrating computational reproducibility across all curricula as a precondition for other dimensions of research tansparency.

Project TIER | Project TIER | Teaching Integrity in Empirical Research

TIER Documentation Protocol provides instructions for assembling files documenting steps of data processing & analysis for a research paper.

**Documentation is the key to Reproducibility**

Essential elements:
- Original data
- Code

Additional elements:
- Output of computational results
- Additional information on data sources
- A read-me file

((I’d argue a README is essential!))

Very cool that the American Economic Association has a dedicated data editor and great online resources!

https://aeadataeditor.github.io/aea-de-guidance/

Step by step guidance

The following steps outline what you should expect after conditional acceptance of your manuscript, in compliance with the AEA Data and Code Availability Policy. Note that the AEA’s Data and Code Availability Policy is compatible with the Data and Code Availability Standard v1.0. Prepare Prepare your data and code replication package (including data citations and provenance information). You can do this at any time, even before submitting to the AEA journals. Start Upload Provide metadata and upload the replication package. This step simultaneously prepares the materials for the verification process as well as for subsequent publication. Do it! Submit Submit the Data and Code Availability Form together with your manuscript native files as instructed, and as per guidelines at your journal (for example, AER guidelines). Only once these materials have been received by the editorial office are verification checks started. Ready to submit? What to expect next The next steps happen behind the scenes, until you receive the replication report: Next steps Approximate process flow Learn More

Office of the AEA Data Editor

I like the "reproducibity trifecta":

1. Fixed folder structure
2. Explicit management of the working directory
3. Use of relative directory paths in scripts

And the "key dimensions of reproducibility":

1. Soup-to-nuts reproducibility
2. (Almost) automated reproducibility
3. Portability

Higher order educational goals served by teaching reproducibility

β€’ Instructors can understand what students produce.
β€’ Students can understand what they produce.
β€’ Students can believe in what they produce.
β€’ Dramatic enhancement of instructor's ability to advise and evaluate student projects (especially with use of a file sharing platform).
β€’ Reinforces core lessons about intellectual integrity that are central to undergraduate education.

Project TIER has been focussing their workshops on individual researchers/instructors, and will be explanding thier focus on making more department-wide changes, in collaboration with the UKRN (thanks for the lovely shout-out to Glasgow #PsyTeachR as a pioneer in this!)

Librarians are key for facilitating #OpenResearch (seriously, go make friends with your uni librarians!)

Data librarians can:

- provide assistance with documentation and metadata
- advise on file naming conventions and format consistency
- recommend strategies for organising and backing up files

(It's very cool that they do basic code review to make sure data prep code runs on another computer)

Terrific point from Norm Medeiros: reproducibility is difficult to retrofit; you need to integrate reproducibility practices at every point in the lifespan of a project.

Now on, Carlos Utrilla Guerrero (https://carlosug.github.io) from TU Delft Library on "What can an open science educator do on teaching and building digital competences in reproducibility? Our lessons learned implementing the Research Data and Software management training"

https://www.tudelft.nl/en/library/research-data-management/r/training-events/training-for-researchers

Home

Carlos Utrilla-Guerrero personal website, with his resume.

TU Delft Library vision for Research Data and Software Management training as part of the education and skills development of students and researchers.

https://zenodo.org/record/3516874

Vision for Research Data & Software management training at TU Delft

This is TU Delft Library vision for Research Data and Software Management training as part of the education and skills development of students and researchers. The courses (some already available and some in preparation) are organised in four different modules, which build upon each other (Fig. 1). The different levels (from bottom to top) increases the specificity of the content from considering data into a general context (e.g. open science) to skills that apply to a specific data type or a research discipline.  The realization of this vision will be a collaborative work of TU Delft Library with different relevant stakeholders within the university (e.g. Data Stewards, researchers, other support services offices, etc.) and with external organizations that have already developed training material and/or courses. This collaborative effort aims at ensuring the sustainability of the training.

Zenodo

The data flow map exercise from this course looks really interesting! It's adapted from https://dataflowtoolkit.dk/

- Create a comprehensive list of datasets (incl. code) used in the project
- Annotate with the actions required for each dataset (e.g. collect, reuse, annotate, anonymise, etc)
- Flag datasets with special characteristics (e.g. personal data, commercial data)

DataFlowToolkit

The TU Delft course for PhD candidates, Research Data Management 101 (RDM 101), is openly available as a self-learning course. It has 5 modules:

1: The importance of RDM
2: Essentials for Research Data
3: FAIR data principles and their main elements
4: Realizing FAIR data
5: How to plan for RDM

https://tu-delft-library.github.io/rdm101-book/intro.html

Welcome to RDM101 β€” RDM101 Course

Next, Julia Kasmire @JKasmireComplex from the UK Data Service on Teaching reproducibility to social scientists. This talk will describe a 5-week bootcamp course from the National Centre for Research Methods that covered:

1 – Intro, generals and specifics of reproducibility
2 – Collaboration, communication and tools thereof
3 – Documenting mind, workflow, processes
4 – Data basics and advanced topics
5 – Publication and AOB

It's nice to hear someone admit that these skills can be challenging to learn and that not everyone needs to learn them to expert level, but we should all know enough to comminumcate with our teams (go #TeamScience!)

It makes me think there should be a resource aimed specifically at people who don't want to learn the technical end of open research, but just the concepts and jargon needed to communicate with their team memebers who do.

Next, Andrew MacLachlan on Reproducible geographical information systems and science. (another lovely #QuartoPub presentation!)

https://andrewmaclachlan.github.io/perspectives-on-teaching-reproducibility/Sheffield_conference.html

Sheffield_conference - Reproducible geographical information Systems and Science

Ooh, the GitHub classroom method for distributing assessments looks really nice. I do wish I could integrate git and github in my teaching more (but I find git installation to be too tricky for the amount of time we have, and it's not on our lab computers)

https://andrewmaclachlan.github.io/CASA0023/

CASA0023 Remotely Sensing Cities and Environments

Time for the second keynote by #PsyTeachR's own @HelenaPaterson on Teaching Reproducibility: reflections on redeveloping a curriculum for teaching reproducible methods

Why do we teach like this at UofG PsychNeuro? We think students need the conceptual and technical skills to be able to complete a research project.

"What is something observable that you think students in your field ought to be able to do when they graduate, and are you adequately preparing them to do this?"(Nolan & Temple Lang, 2010; Peck and Chance, 2007)

We often only see the end result of data processing and the rest of the pipeline is hidden. If you only give students clean "final" data, they don't learn the skills needed to deal with real raw data and are set up to fail at independent research.

In this paper, the #PsyTeachR team argues that training in data processing and transformation should be embedded in field-specific research methods curricula. Promoting reproducibility and open science requires not only teaching relevant values and practices, but also providing the skills needed for reproducible data analysis.

https://psyarxiv.com/hq68s/

Tools in our toolbox:

- R and RStudio
- tidyverse
- dirty data
- #PsyTeachR open resources
- Open research focus in assessment
- Community support

One way to make time for teaching these skills:

You do not have to teach every statistical test if you teach the foundations well. It's more important to teach student how to learn in a self-directed manner than to teach the individual tests.

Building expertise:

- Use one language and a novice friendly syntax (we use tidyverse in R through RStudio)
- Progression: Increase complexity and reduce hints over time
- Make formative self-assessment with solutions at the end of each section to allow students to check themselves and to move on (also takes away reliance of staff to know everything – solutions are there)
- Real data wherever possible to reiterate the worth of the approach and to make use of student interests
- Integrate theory and research methods alongside data skills so students see the connections between the theory, the research, and the analysis.
- Worthwhile looking into an RStudio Server and academic pricing as that can reduce installation issues in early years removing a barrier to getting started.

Assessment is core, it communicates to students what we value most. Some examples:

Using registered reports as assessments teaches the open research skills we value. Starts with methods, rather than data, so students don't focus on "getting the "right answer".

Secondary data research reports: design a novel research question for a complex dataset, conduct the analysis and write a full report

Formative peer review of a pre-registered analysis plan: make an analysis script to analyse your data and share it in class for peer review

Building a community:

- Group work so students work with peers and learn about team science
- Support creates trust so people become more open to talking about mistakes and errors, and asking for help
- Seminar and workshop series based around methods and metascience that is open to both students and staff to present ideas and questions: https://psyteachr.github.io/mms/
- Students appreciate seeing staff ask questions as well

Methods & MetaScience Seminar

Schedule and material for the Methods & MetaScience seminar series for the University of Glasgow's Institute of Neuroscience and Psychology

Changes to your methods curriculum need to be sustainable.

Staff training
- Current - Potentially yearly for a while until you have a sustainable base
- New - It can be difficult to find applicants that can do everything you do

Document Everything
- Particularly the rationale and principles behind your approach
- These are not set in stone but will help focus discussions on future direction of course as key people leave and new people join and whether you are still adhering to the original principles or is time to update them

Take home messages:

- Don’t reinvent the wheel
- Slow it down: gradually build expertise
- Use what the community provides
- Assessment communicate our values
- What one thing can you start with?

(Please use and adapt our CC-BY-SA open resources at https://psyteachr.github.io!)

PsyTeachR

Psychologists from @UofGPsychNeuro advocating open, reproducible methods teaching.

PsyTeachR
@debruine this is great: reminding myself to come back to this, as we develop our curriculum and our materials @johnntowse and @tombeesley