Files, Folders, and Project Discipline

Author

Steen Harsted & Søren O’Neill

Published

April 25, 2025

1 Presentation

You can download the course slides for this section here

  

2 Setting up your course project

Follow these instructions step by step:

  1. Start a new project.
    • files_folders_and_project_names_matter
    • This is going to be your course project and you will work in this over the next days
    • The name could be R4phd_course_SDU
  2. Create a folder in your project called raw_data
  3. Create a folder in your project called clean_data
  4. Create a folder in your project called scripts
  5. Create a folder in your project called plots
  6. Create a folder in your project called tables
  7. Place the soldiers.csv file in the raw_data folder
  8. Create an R script (File -> New File -> R script), save it in the scripts folder and call it 01_import.R
  9. Close the R script you just created and leave it for now.

    This file and folder structure is a basic setup that will work for most projects.




3 here

Load the tidyverse and the here packages

Show me the code
library(tidyverse)
library(here)

 

3.1 here()

3.1.1 Try out the here() function

  • What happens if you write here() ?
  • What happens if you write here("raw_data") ?
  • Compare your output with your neighbors.
  • Discuss if and how the here()function could ever be of any use to anybody


3.1.2 Read the soldiers.csv dataset in the “raw_data” folder and assign it to an object called soldiers

  • What does “Read” mean? (import/read/load - use the functions read_csv() or read_csv2)
  • How do I target the “raw_data” folder - use here()
  • What does assign mean? (remember <-)
Show me the code
soldiers <- read_csv2(here("raw_data", "soldiers.csv"))









4 Setting up your course project (continued)

Important

You should not do this before you have completed the wrangling exercises for select(), filter(), summarise(), group_by(), arrange(), and mutate()

You now have the skills to continue the work we started in Section 2. This next task is crucial for the rest of your course.

You will now learn how to command R to run another file located inside your project. We call this to source an R script, and we use the function source() for this. Being able to source R scripts will help you to secure reprodicbility without having extremely long and complicated quarto files.

As you’ve noticed, the soldiers dataset is not perfect when we load it. For example, Height is measured in inches, and weightkg is measured in kilograms times ten. We need to make some adjustments before we can proceed with our analysis.You are going to write the code needed to clean the soldiers dataset in the script 01_import.R that you created earlier. When we later source the 01_import.R script, you will get a clean and usable dataset in a well-documented manner that ensures reproducibility.

Here are your next steps:

  • Open the R script 01_import.R that you created in Section 2.
  • For the rest of this section, you’ll work within this R script to clean your data using dplyr.

 

4.0.1 Write the necessary code to import the soldiers.csv file and update the data

Write comments and explain your code as you solve the steps below

  • Add heightcm (height in cm)
  • Remove Heightin
  • Fix weightkg
  • Explore the sex variable and fix it
  • Add BMI
  • Add category (level of BMI)
  • Add race - Base the values in race on the description below
  • Place the variables in an order that you like (use relocate())
  • Make sure that all changes are assigned to soldiers.
  • This script should be clean in the sense that it ONLY contains the steps necessary to clean the code and comments that explain the code

DODRace is a variable in the soldiers dataset. The description is given below: 

DODRace – Department of Defense Race; a single digit indicating a subject’s self-reported preferred single race where selecting multiple races is not an option. This variable is intended to be comparable to the Defense Manpower Data Center demographic data. Where 1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Native American, 6 = Pacific Islander, 8 = Other

If you are out of time and need an import file that does what it needs to do.
Download this file: 01_import.R