Installation & readings
Rstudio is an IDE (Integrated Development Environment) that makes it
easier to write and execute R code.
R is a programming language that is used for a lot for data analysis
and statistics. We will introduce you to it’s usage. Time is short,
please :
Note: Should you want to know more, the guided tour for Rstudio is available.
install.packages("tidyverse", dependencies = TRUE)
This can be more challenging, if you have enough time to
try:
You will need to create a ssh key (The starts is the most difficult!
last step, but it is worth it !)
PLEASE LET ME KNOW IF YOU DID NOT MANAGE THIS BEFORE
WE MEET. We can eg. have a look during an online
meeting.
Why we need to do this ? It’s about Rproducible science and Open
data science
This part will be discussed in the course.
The way we will work will help us to do reproducible research. It
will help you (and us) to organize the data analysis work, and document
what you have done, including the reasons behind your choices.
Three months from now YOU might not remember the reasoning and all
the steps you have done in your analyses. Documenting what you are doing
at the same time you are doing it, is a very good practice.
This will save you time and struggles. What you have done will be
essential information for publication and manuscript revision.
Moreover, working this way, will allow you to start setting up your
analyses BEFORE all the data are collected. You will be able to re-run
all your code using updated data. This is helping you being
pro-active.
What are the requirements of reproducible research ? The
following article mention
10
Rules of reproducible research
- For Every Result, Keep Track of How It Was Produced
- Avoid Manual Data Manipulation Steps
- Archive the Exact Versions of All External Programs Used
- Version Control: Use it for all Customized Scripts
- Record All Intermediate Results, When Possible in Standardized
Formats
- For Analyses That Include Randomness, Note Underlying Random
Seeds
- Always Store Raw Data behind Plots
- Generate Hierarchical Analysis Output, Allowing Layers of Increasing
Detail to Be Inspected
- Connect Textual Statements to Underlying Results
- Provide Public Access to Scripts, Runs, and Results
We will see that using R and Rstudio can help you to relatively easy
follow all steps but 4 and 10. The latest, which require using a version
control system like Git, and eventually an associated online platform
like github that allows to store your code on the web. Github in turn
can be used during the publication and can facilitate the creation of
DOI via integration with Zenodo. This then allows people to cite your
data analysis work and code!
WARNING: NEVER PUT YOUR DATA ON GITHUB, only the code to
process the data
Additional ressources you can eventually read
We might not have time to go into details with git and github, but we
will try to
But if you feel like it, you can learn Git for version control (it
can be used with the command line - aka Terminal
panel that
is available in Rstudio):
- Git-Novice is
a good lesson to start with.
Preliminary training program
- github and R - get started with a project (understand the basic
principles)
- data wrangling
- data exploration and visualization (ensure data quality, what do I
have)
- eventually … going further
Back to Index
