Show me the code
library(tidyverse)
library(here)Steen Harsted & Søren O’Neill
August 17, 2025
You can download the course slides for this section here

Follow these instructions step by step:
R4phd_course_SDUraw_dataclean_datascriptsplotstablesraw_data folderscripts folder and call it 01_import.Rheretidyverse and the here packages
here()here() functionhere() ?here("raw_data") ?here()function could ever be of any use to anybody
soldiers.csv dataset in the “raw_data” folder and assign it to an object called soldiersread_csv() or read_csv2)here()<-)
You should not do this before you have completed the wrangling exercises for select(), filter(), summarise(), group_by(), arrange(), and mutate()
You now have the skills to continue the work we started in Section 2. This next task is crucial for the rest of your course.
You will now learn how to command R to run another file located inside your project. We call this to source an R script, and we use the function source() for this. Being able to source R scripts will help you to secure reprodicbility without having extremely long and complicated quarto files.
As you’ve noticed, the soldiers dataset is not perfect when we load it. For example, Height is measured in inches, and weightkg is measured in kilograms times ten. We need to make some adjustments before we can proceed with our analysis.You are going to write the code needed to clean the soldiers dataset in the script 01_import.R that you created earlier. When we later source the 01_import.R script, you will get a clean and usable dataset in a well-documented manner that ensures reproducibility.
Here are your next steps:
soldiers.csv file and update the dataWrite comments and explain your code as you solve the steps below
heightcm (height in cm)Heightinweightkgsex variable and fix itBMIcategory (level of BMI)race - Base the values in race on the description belowrelocate())soldiers.race in soldiers
DODRace is a variable in the soldiers dataset. The description is given below:
DODRace – Department of Defense Race; a single digit indicating a subject’s self-reported preferred single race where selecting multiple races is not an option. This variable is intended to be comparable to the Defense Manpower Data Center demographic data. Where 1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Native American, 6 = Pacific Islander, 8 = Other
If you are out of time and need an import file that does what it needs to do.
Download this file: 01_import.R