Reproducible Research in R: An introductory course on modern data analyses and workflows

Reproducibility and open scientific practices are increasingly being requested or required of scientists and researchers, but training on these practices has not kept pace. This course, offered by the Danish Diabetes Academy, intends to help bridge that gap.

Syllabus

Reproducibility and open scientific practices are increasingly demanded of scientists and researchers. Training on how to apply these practices in data analysis has not kept up with demand. With this course, we hope to begin meeting that demand. Using a very practical approach bases mostly on code-along sessions (instructor and learner coding together), the course will:

Explain what an open and reproducible data analysis workflow is, what it looks like, and why it is important.
Explain and demonstrate why R is rapidly becoming the standard program of choice for doing modern data analysis in science.
Demonstrate and apply collaborative tools and techniques when working in team settings (including working with your future self).
Show and apply the fundamental tools and skills for conducting a reproducible and modern analysis for a research project.
Show where to go to get help and to continue learning modern data analysis skills.

We’ll be addressing the following questions:

What is R, why should I use it, and how do I use it?
What does a modern data analysis setup and workflow look like?
What is reproducibility and how is it different from replicability?
How can I ensure my data analysis project is reproducible?
How can I import and work with my data in R?
How can I visualize my data and make publication-quality figures?
Why should I and how can I keep track of changes to my analysis files?
How can I write reports to document, describe, and present analyses in a reproducible way?

By the end of the course, participants will have a basic level of proficiency in using the R statistical computing language, enabling them to improve their data and code literacy, and to conduct a modern and reproducible data analysis. The course will place particular emphasis on research in diabetes and metabolism; it will be taught by instructors working in this field and it will use relevant examples where possible.

Is this course for you?

To help manage expectations and develop the material for this course, we make a few assumptions about who you are as a participant in the course:

You are a researcher, likely working in the biomedical field (ranging from experimental to epidemiology).
You currently or will soon do some quantitative data analysis.
You:
- know nothing or little about R (or computing in general);
- haven’t used coding programs for doing data analysis (e.g. used SPSS);
- have used coding programs before (e.g. used SAS or Stata), but not R;
- know how to use R, but haven’t used the tidyverse or RStudio.

While we have these assumptions to help focus the content of the course, if you have an interest in learning R but don’t fit any of the above assumptions, you are still welcome to attend the course!! We don’t turn anyone away if we can.

In addition to the assumptions, we also have a fairly focused scope for teaching and expectations for learning. So this may also help you decide if this course is for you.

We do teach how to use R, starting from the very basics and targeted to beginners.
We do not teach statistics (these are already covered by most university curriculums).
We do teach from a team science, reproducible research, and open scientific perspective (i.e. by including a collaborative group project that uses a transparent and reproducible analysis workflow).
We do teach using practical, applied, and hands-on lessons and exercises, with a few short lectures that introduce a topic.

To further develop your R skills and knowledge, we are having an advanced R course from September 8-9, 2020 that will build off of this course (though it isn’t dependent on this course). Keep on eye on DDA announcements to register and learn more about it!

Pre-workshop instructions

Complete the pre-course tasks webpage. Deadline is June 18.
Make sure to bring your own laptop, since we do hands-on learning.
Read and abide by the Code of Conduct.

Instructors and helpers

Lead instructor and organizer:
- Luke Johnston
Instructors:
Helpers:
- Malene Revsbech Christiansen
- Anders Aasted Isaksen

Schedule

The workshop is structured as a series of participatory live-coding sessions (instructor and learner coding together) interspersed with hands-on exercises and group work, using either a practice dataset or some other real-world dataset. There are some lectures given, mainly at the start and end of the workshop. The official DDA program can be found in this PDF.

Date and time	Session topic	Type	Instructor
June 22
9:30	Arrival; coffee and snacks
10:00	Introduction to the course	Lecture	Luke
10:30	Management of R projects (with short coffee break)	Code-along	Luke
12:30	Lunch
13:30	Data management and wrangling	Code-along	Bettina
14:30	Coffee break and hotel check-in
15:00	Finding and obtaining open datasets	Lecture	Daniel
15:30	Data management and wrangling (with short break)	Code-along	Bettina
17:30	End of day short survey
17:45	Free time
18:30	Dinner
20:30	Social activity in the basement hotel bar
June 23
7:00-8:30	Breakfast
8:30	Collaboration and teamwork in research	Lecture	Daniel
9:00	Version control and collaborative practices	Code-along	Luke
10:15	Coffee break and snacks
10:30	Version control and collaborative practices	Code-along	Luke
12:15	Lunch
13:15	Data visualization	Code-along	Helene
14:45	Coffee break and snacks
15:00	Data visualization	Code-along	Helene
17:00	End of day short survey
17:15	Free time
18:30	Dinner
20:30	Drinks and chats around a bonfire
June 24
7:00-8:30	Breakfast and checkout
8:30	Research in the era of (ir)reproducibility and open science	Lecture	Luke
9:15	Creating reproducible documents	Code-along	Luke
10:15	Coffee break and snacks
10:30	Creating reproducible documents	Code-along	Luke
12:15	Lunch
13:15	Group work: Presentation of projects, and discussions
15:15	Closing remarks and short survey
15:30	Farewell

Material

Current version of course

The course material is found as an online book at r-cubed.rostools.org.

Past versions of the course

Version 1, March 2019

The first version of this course, called Reproducible Quantitative Methods: Data analysis workflow using R, was taught in March, 2019 in Middelfart, Denmark.

The original R Markdown teaching material can be found in the GitLab repository. Anyone is free to use, re-use, modify, and so on (as per the license) as long as you properly attribute the work. Please see the “How to cite” section of the README in the repository.