Reproducible Data Analysis: an Essential Capability in Modern Science
(Bio-)ingenieurswetenschappen
(Bio-)ingenieurswetenschappen
The scientific method is historically linked to the possibility that other researchers can replicate and verify its results. As scientific analysis becomes more complex and interdisciplinary, ensuring reproducibility becomes more challenging specially in fields that combine different expertise. To promote transparency, consistency and robustness in science, journals, funders, and institutions are encouraging the use of tools and practices that enhance reproducibility. Lifelong learning helps professionals to keep up with the fast-paced scientific developments and to foster creativity and innovation. By learning about version control, containers, pipelines and data reproducibility, scientists of all levels can improve the reproducibility of their research, as well as the impact and reliability of their findings and methods. Moreover, using the methods introduced here they can collaborate and experiment in ways that allow reproducibility and creativity to coexist and thrive.
Session 1
- Successful examples of the use of these tools for reproducibility
- Aspects of reproducibility of data
Session 2
- Introduction to Git and GitHub concepts
- Routine usage of Git
- Inspect and compare different versions of a git project
- Connecting and integrating to GitHub
- Collaborate and experiment with Git and GitHub
Session 3
- How to access VSC facilities and use the HPC scheduling
Session 4
- Introduction to containers basic concepts and Docker syntax
- Find, obtain, and run a Docker image
- Adapt and build Docker recipes
- Find and run Apptainer images
- Adapt and build Apptainer images
- Use Apptainer in the VSC with the scheduling system
Session 5
- Introduction to NextFlow concepts and syntax
- Execute NextFlow pipelines with different executors and environments
- Write and run a NextFlow pipeline
- Write and modify modules and config files as best practice for pipeline development
- Use NextFlow in the VSC with the scheduling system
Session 6
- Projects:
* 2 small projects
o Git & GitHub project consists of creating collaboratively your documentation, with version control. The project is started during the lesson and finished asynchronously before delivery. (Estimated asynchronous time 3h)
o Docker and Apptainer project consist of adapting, writing, and building one Docker image based on a Docker recipe. The project must be delivered in GitHub, with history of versions available. The project is to be collaboratively developed after the lesson. (estimated asynchronous time 6h)
* 1 medium project
o NextFlow project consists of using docker or Apptainer images to create and run NextFLow pipeline that use config files and modules. The project must be delivered in GitHub with the history of versions available.
o Complementarily, an oral presentation (defence) of the final project must include a summary of the topics learned and examples that can demonstrate the use of the tools and focusing on reproducibility.
o The project is to be developed collaboratively after the lesson (estimated asynchronous time 8h)
* Being able to use simple shell commands (Linux for example), you can use this e-learning material to prepare.
* Experience with scripting is preferred (point to resource of catch-up before the course)
* Creating a VSC account
Project evaluation and a Oral evaluation.
This is an on campus course.
Syllabus, overheads, exercises handout
Technology park, 75 – CMB building (FSVM II), 9052 Ghent
Day 1: L5 room, the 5th floor
All the other L4 room, the 4th floor