Show me the code
library(tidyverse)
library(here)
Steen Harsted & Søren O’Neill
April 25, 2025
You can download the course slides for this section here
Follow these instructions step by step:
R4phd_course_SDU
raw_data
clean_data
scripts
plots
tables
raw_data
folderscripts
folder and call it 01_import.R
here
tidyverse
and the here
packages
here()
here()
functionhere()
?here("raw_data")
?here()
function could ever be of any use to anybody
soldiers.csv
dataset in the “raw_data” folder and assign it to an object called soldiers
read_csv()
or read_csv2
)here()
<-
)You should not do this before you have completed the wrangling exercises for select()
, filter()
, summarise()
, group_by()
, arrange()
, and mutate()
You now have the skills to continue the work we started in Section 2. This next task is crucial for the rest of your course.
You will now learn how to command R to run another file located inside your project. We call this to source
an R script, and we use the function source()
for this. Being able to source R scripts will help you to secure reprodicbility without having extremely long and complicated quarto files.
As you’ve noticed, the soldiers
dataset is not perfect when we load it. For example, Height is measured in inches, and weightkg is measured in kilograms times ten. We need to make some adjustments before we can proceed with our analysis.You are going to write the code needed to clean the soldiers
dataset in the script 01_import.R
that you created earlier. When we later source the 01_import.R
script, you will get a clean and usable dataset in a well-documented manner that ensures reproducibility.
Here are your next steps:
soldiers.csv
file and update the dataWrite comments and explain your code as you solve the steps below
heightcm
(height in cm)Heightin
weightkg
sex
variable and fix itBMI
category
(level of BMI)race
- Base the values in race on the description belowrelocate()
)soldiers
.race
in soldiers
DODRace
is a variable in the soldiers
dataset. The description is given below:
DODRace
– Department of Defense Race; a single digit indicating a subject’s self-reported preferred single race where selecting multiple races is not an option. This variable is intended to be comparable to the Defense Manpower Data Center demographic data. Where 1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Native American, 6 = Pacific Islander, 8 = Other
If you are out of time and need an import file that does what it needs to do.
Download this file: 01_import.R