# this is definitely okay
<- 2 + 2
x
# ...but is the following okay?
tibble(x=1:10, y=2:11) %>% # ..Is this okay?
filter(y>5 & # How about breaking in the middle..
!= 9) # ..of a statement like 'filter'? x
…the basics
2023-01-01
..from https://doi.org/10.1146/annurev-psych-020821-114157
“Facilitate easy and accurate reproducibility of all steps of research: Results, process and comprehension – from raw data to finished output”
::: {.cell}
:::
Aim for (**)
(..click the tabs)
# A tibble: 5 × 2
x y
<int> <int>
1 5 6
2 6 7
3 7 8
4 8 9
5 10 11
These comments were syntactically okay … BUT did they make your code easier for humans to read?
What comments would be relevant here?
Suggest comments for this code… think ‘what’ and ‘why’
Suggestions for meaningful comments
# ?? comments necessary ??
d <- read.csv("my_data_file.csv")
# excluded because participant entered an invalid CPR number
d <- d %>% filter(id != "2321369-1212")
# set 's' to F(emale) or M(ale) depending on odd/even last digit in CPR
d <- d %>%
mutate(s=factor(c("M", "F"))[as.numeric(substr(id,nchar(id),nchar(id))) %% 2])
Main points
In markdown, there's
an important difference
between '_new-line_' and '_empty-line_'.
...white space matters!
In markdown, there’s
an important difference between ‘new-line’ and ‘empty-line’.
…white space matters!
a%>%b
vs a %>% b
Check out Soft wrap long lines in the Code menu.
Maintain a README.md file in each project, at the root level
Especially important for larger, more complex projects with many data sources, collaborators, etc
# Project title
A subtitle that describes your project, e.g., research question
## Motivation
Motivate your research question or business problem. Clearly explain which problem is solved.
## Method and results
First, introduce and motivate your chosen method, and explain how it contributes to solving the research
question/business problem.
Second, summarize your results concisely. Make use of subheaders where appropriate.
## Repository overview
Provide an overview of the directory structure and files, for example:
├── README.md
├── data
│ ├── my_data.csv # raw data from CPR register
│ ├── exp_data.csv # experimental data register
├── plots
│ ├── plot_1.png # Boxplot of age
│ ├── plot_2.png # Pi chart of sex
│ └── plot_3.png # Bi-plot age vs measurement X
├── main.R # all analyses in one place
└── manuscript1.Rmd # for J of RR
## Running instructions
Explain to potential users how to run/replicate your workflow. If necessary, touch upon the required input
data, which secret credentials are required (and how to obtain them), which software tools are needed
to run the workflow (including links to the installation instructions), and how to run the workflow.
## More resources
Point interested users to any related literature and/or documentation.
## About
Explain who has contributed to the repository.
Maintain a data definition (markdown) file in data file project, in the same folder as the data file itself
Especially important for larger, more complex projects with many data sources, collaborators, etc
There are only two hard things in Computer Science: cache invalidation and naming things. – Phil Karlton
Let variable, function and file names convey meaning.
Suggest alternative code and variable names for this code
Let variable, function and file names convey meaning.
Alas, the function cpr2sex does not exist in base R or Tidyverse
Tip
# Requires a custom function like this -- which could be sourced from file
cpr2sex <- function(x) {
# This function takes a string (x), presumed to be a valid Danish CPR
# and return "F", "M" or NA depending on the last character in the string
# If the last CPR character is an even number, it indicates female sex, and
# an odd number indicates male sex.
if (str_sub(x, str_length(x), str_length(x)) %in% c("0","2","4","6","8")) {
return("F")
} else {
return("M")
}
return(NA) # Last character in CPR is not a ciffre
}
We could hide this away in a separate file and ‘source’ it .. or even make a new package…
Main points
One project in one folder!
Should contain
Use the here()
function to refer subfolders.
Using relative paths with ./
and ../
can also work
Do not use absolute filesystem paths like C:/users/Einstein/Documents/
gfx/prefer_text_files.pdf
gfx/prefer_text_files.pdf
gfx/prefer_text_files.xml
Stick to simple, human-readable files like R-scripts, markdown, csv files, etc, as far into the process as you can and only generate pdf, word, tiff, jpeg etc files as the final step.
…only really one potential issue with textfiles: CHARSETWeird characters?
E.g. Søren instead of SørenIt’s probably the character encoding (Microsoft Excel again!) – just stick to UTF8/UTF16
Main points
Comments in R code and markdown
R code
Markdown