Home Assignment
Welcome to your first home assignment of the R course! This exercise will help you apply what you’ve learned so far.
Your task is to build a data science workflow. You will import a dataset of your choice, clean, explore, and analyze it, and present your findings in a well-structured and styled Quarto document. This assignment encourages creativity and critical thinking, as well as good coding practices. Below are the detailed instructions. Happy coding!
1 Home Assignment Instructions
1.1 Choose a Dataset:
You are free to use your own data for this assignment, or any other dataset that interests you.
You are welcome to find a dataset on the web. Check out sources like Kaggle or Tidytuesday for inspiration.
You can use the soldiers_leave
data that you can find under Self-study
-> Data adventures
-> Soldiers on leave
If you are short on time, you can use the soldiers
data.
1.2 Create a New Project:
Your_project_name/
├── clean_data/
├── plots/
├── raw_data/
| └── your_data.whatever
├── scripts/
| └── 01_import.R
├── tables/
├── your_analysis.qmd/
└── Your_project_name.Rproj
- Do NOT place your project inside another project.
- Remember to use comments in your code for clarity.
1.3 The quarto file
- Change the YAML header to style your document output.
---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format:
html:
toc: true
toc-depth: 2
embed-resources: true
number-sections: true
number-depth: 2
code-fold: true
code-summary: "Show the code"
code-tools: true
execute:
message: false
warning: false
---
### Libraries
```{r}
library(tidyverse) # tidy, transform, and plot
library(here) # to handle file paths ```
- Ensure the Quarto document sources your script.
- E.g.
source(here("scripts", "01_import.R"))
- E.g.
- Write a brief introduction about the data and what you want to explore.
- Formulate one or more questions about the data.
- Use visual analysis to explore these questions. Create plots that you think could be interesting.
- Write a short conclusion based on your visual interpretation of the data
1.3.1 Submit
- When is the assignment due? Check information on ItsLearning.
- Render the Quarto document to HTML.
- zip your project (DONT INCLUDE SENSITIVE DATA)
- upload the zipped project (WITHOUT ANY SENSITIVE DATA) on ITsLearning
1.4 Meet the Raters: Your Assignment Guides!
Your assignment will be reviewed by six unique raters. Each rater has a special area of interest—from data organization to creative visualizations. They’ll carefully evaluate your work and award badges if your assignment meets their standards and aligns with their preferences.
Click on the tabs below to learn more about your raters and what they’re looking for in your work!
Focus: Supporting and guiding you through the assignment.
Tip: These friendly chaps are likely to approve anything you hand in.
Likes: R, and assignments that are handed in on time.
Dislikes: Missing assignments.
Focus: Project organization and reproducibility.
Tip: Keep your files, folders, and scripts tidy. Follow the suggested directory structure and use proper naming conventions.
Likes: Clear file organization, reusable scripts, and well-commented code.
Dislikes: Disorganized projects and messy file names.
Focus: Graphics and animations.
Tip: Make your visuals pop! Use ggplot with interactive libraries and animations to make your findings stand out. Check out the ggplot2 - extra material on this website or the R graph gallery, and incorporate some of their visual magic into your work.
Likes: Creative, interactive, and dynamic visualizations and animations.
Dislikes: Basic, unstyled plots.
Focus: References and citations.
Tip: Make sure your work cites sources properly. Reference datasets and tools using Quarto’s citation features or inline text. Check out the Citing publications in your manuscript page on this website.
Likes: Impeccable citations and credit given where due.
Dislikes: Missing or vague references.
Focus: Document styling and themes.
Tip: Use Quarto’s YAML options to style your HTML document with themes, tabs, and polished layouts. Check out HTML basics and more at the quarto website.
Likes: Elegantly styled reports with clear structure.
Dislikes: Plain, unformatted documents.
Focus: Dataset creativity.
Tip: Go beyond the soldiers
dataset! Explore Kaggle, Tidytuesday, or your own data.
Likes: Fresh, exciting datasets.
Dislikes: The default soldiers
data.
2 Some tips when you clean your data
2.0.1 Messy column names? janitor
will help you
Some of you will encounter column names with whitespaces, special characters, or inconsistent capitalization. You can fix this manually (e.g. rename()
), but you might have hundreds of messy column names. The janitor
package in R provides a handy function called clean_names()
to address this issue. * my_data |> clean_names()
- Converts all column names to lowercase. - Replaces spaces and special characters with underscores. - Ensures column names are unique and consistent.
2.0.2 Your data is wide? Make it long
What is Wide Format?
Wide format data is characterized by having multiple columns for different variables across a single row. Each row typically represents a unique subject, and each column represents different measurements or observations of that subject. E.g. you will have column names like “weight_t1”, weight_t2”, “weight_t3”, …. Data is often wide because its easier for humans to understand the raw data in this format. However, the wide format is suboptimal for dataanalysis. Go to the section “Tidy 2” under Workflow for manuscripts and read about the benefits of working with long data.
2.0.3 Your are importing labelled data from Stata or SPSS?
Use the function as_factor()
from the haven
package to change the labels into R factors.