Home Assignment

Author

Steen F. Harsted & Søren O’Neill

Published

April 25, 2025

Welcome to your first home assignment of the R course! This exercise will help you apply what you’ve learned so far.

Your task is to build a data science workflow. You will import a dataset of your choice, clean, explore, and analyze it, and present your findings in a well-structured and styled Quarto document. This assignment encourages creativity and critical thinking, as well as good coding practices. Below are the detailed instructions. Happy coding!

1 Home Assignment Instructions

1.1 Choose a Dataset:

You are free to use your own data for this assignment, or any other dataset that interests you.

You are welcome to find a dataset on the web. Check out sources like Kaggle or Tidytuesday for inspiration.

You can use the soldiers_leave data that you can find under Self-study-> Data adventures -> Soldiers on leave

If you are short on time, you can use the soldiers data.

1.2 Create a New Project:

Your_project_name/
├── clean_data/
├── plots/
├── raw_data/
|    └── your_data.whatever
├── scripts/
|    └── 01_import.R
├── tables/
├── your_analysis.qmd/
└── Your_project_name.Rproj
  • Do NOT place your project inside another project.
  • Remember to use comments in your code for clarity.

1.3 The quarto file

  • Change the YAML header to style your document output.
---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format: 
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
    number-sections: true
    number-depth: 2
    code-fold: true
    code-summary: "Show the code"
    code-tools: true
execute:
  message: false
  warning: false
---

### Libraries
```{r}
library(tidyverse) # tidy, transform, and plot
library(here)      # to handle file paths
```
  • Ensure the Quarto document sources your script.
    • E.g. source(here("scripts", "01_import.R"))
  • Write a brief introduction about the data and what you want to explore.
  • Formulate one or more questions about the data.
  • Use visual analysis to explore these questions. Create plots that you think could be interesting.
  • Write a short conclusion based on your visual interpretation of the data

1.3.1 Submit

- When is the assignment due? Check information on ItsLearning.
- Render the Quarto document to HTML.
- zip your project (DONT INCLUDE SENSITIVE DATA)
- upload the zipped project (WITHOUT ANY SENSITIVE DATA) on ITsLearning

1.4 Meet the Raters: Your Assignment Guides!

Your assignment will be reviewed by six unique raters. Each rater has a special area of interest—from data organization to creative visualizations. They’ll carefully evaluate your work and award badges if your assignment meets their standards and aligns with their preferences.

Click on the tabs below to learn more about your raters and what they’re looking for in your work!

Focus: Supporting and guiding you through the assignment.
Tip: These friendly chaps are likely to approve anything you hand in.
Likes: R, and assignments that are handed in on time.
Dislikes: Missing assignments.

Focus: Project organization and reproducibility.
Tip: Keep your files, folders, and scripts tidy. Follow the suggested directory structure and use proper naming conventions.
Likes: Clear file organization, reusable scripts, and well-commented code.
Dislikes: Disorganized projects and messy file names.

Focus: Graphics and animations.
Tip: Make your visuals pop! Use ggplot with interactive libraries and animations to make your findings stand out. Check out the ggplot2 - extra material on this website or the R graph gallery, and incorporate some of their visual magic into your work.
Likes: Creative, interactive, and dynamic visualizations and animations.
Dislikes: Basic, unstyled plots.

Focus: References and citations.
Tip: Make sure your work cites sources properly. Reference datasets and tools using Quarto’s citation features or inline text. Check out the Citing publications in your manuscript page on this website.
Likes: Impeccable citations and credit given where due.
Dislikes: Missing or vague references.

Focus: Document styling and themes.
Tip: Use Quarto’s YAML options to style your HTML document with themes, tabs, and polished layouts. Check out HTML basics and more at the quarto website.
Likes: Elegantly styled reports with clear structure.
Dislikes: Plain, unformatted documents.

Focus: Dataset creativity.
Tip: Go beyond the soldiers dataset! Explore Kaggle, Tidytuesday, or your own data.
Likes: Fresh, exciting datasets.
Dislikes: The default soldiers data.



2 Some tips when you clean your data



2.0.1 Messy column names? janitorwill help you

Some of you will encounter column names with whitespaces, special characters, or inconsistent capitalization. You can fix this manually (e.g. rename()), but you might have hundreds of messy column names. The janitor package in R provides a handy function called clean_names() to address this issue. * my_data |> clean_names()
- Converts all column names to lowercase. - Replaces spaces and special characters with underscores. - Ensures column names are unique and consistent.



2.0.2 Your data is wide? Make it long

What is Wide Format?

Wide format data is characterized by having multiple columns for different variables across a single row. Each row typically represents a unique subject, and each column represents different measurements or observations of that subject. E.g. you will have column names like “weight_t1”, weight_t2”, “weight_t3”, …. Data is often wide because its easier for humans to understand the raw data in this format. However, the wide format is suboptimal for dataanalysis. Go to the section “Tidy 2” under Workflow for manuscripts and read about the benefits of working with long data.



2.0.3 Your are importing labelled data from Stata or SPSS?

Use the function as_factor() from the haven package to change the labels into R factors.