Author

Steen Flammild Harsted & Søren O´Neill

Published

April 25, 2025



1 Presentation

You can download the course slides for this section here

Getting Started

  • Make sure that you are working in your course project
  • Create a new quarto document and name it “regression.qmd”
  • Insert a code chunk and load 2 important libraries
  • Insert a new code chunk- Write source(here("scripts", "01_import.R")) in the chunk
  • Write a short headline to each code chunk
  • Change the YAML header to style your document output.
---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format: 
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
    number-sections: true
    number-depth: 2
    code-fold: true
    code-summary: "Show the code"
    code-tools: true
execute:
  message: false
  warning: false
---




Add the easystats and gtsummary packages to the code chunk where you have your library calls.

If you need to install any of the packages:

  • use install.packages(c("package_name", "package_name")) to download the packages.
  • This is done in the console and NOT in your script.
Show the code
library(tidyverse)        # import, tidy, transform, visualize
library(here)             # filepath control
library(gtsummary)        # regresssion tables (with stats)
library(easystats)        # everything regression





2 The palmerpenguins dataset



Add the palmerpenguins packages to the code chunk where you have your library calls.

If you need to install the package:

  • use install.packages("palmerpenguins") to download the package.
  • This is done in the console and NOT in your script.
Show the code
library(palmerpenguins)  # Has the 'penguins' data

If you want to see the penguins data in your environment pane you can add it as an object. You don’t need to do this. ::: {.cell}

Show the code
penguins <- penguins

:::



Read the documentation for palmerpenguins here



Using the penguins dataset. Let body_mass_g be your dependent variable (y), and investigate how changes in the other variables in penguins impact body_mass_g

  • Build at least two models

  • Consider making the formula for one of the models body_mass_g ~ . - What does this do?

  • Store them as (pen_model1, pen_model2, etc.)

  • Remember that we are just practicing coding here. Today is about How do we code it. The purpose of this particular exercise is to introduce you to a few very useful functions from the easystats package(s), and to get you started with using lm().

Show the code
pen_model1 <- lm(body_mass_g ~ sex + species, data = penguins)
pen_model2 <- lm(body_mass_g ~ bill_depth_mm + bill_length_mm + flipper_length_mm, data = penguins)
pen_model3 <- lm(body_mass_g ~ ., data = penguins)



2.1 Base R

  • Explore the elements in one of your models using $
    • Type pen_model1$ and explore the drop down menu
    • coefficients(pen_model1)
    • Print a summary() of one of your models



2.2 easystats

  • Check the assumptions of your models (check_model)
    • Discuss if/why some assumptions are violated
    • What could be done to remedy this? ::: {.cell}
Show the code
check_model(pen_model1)
check_model(pen_model2)
check_model(pen_model3)

:::

  

  • Investigate the model parameters of your models model_parameters()
  • add summary info to one or more of the outputs
  • Use robust standard errors for one or more of the model outputs ::: {.cell}
Show the code
model_parameters(pen_model1)
model_parameters(pen_model2, vcov = "HC1")
model_parameters(pen_model3, summary = TRUE)

:::

  

  • Compare your models parameters compare_parameters()
Show the code
compare_parameters(pen_model1, pen_model2, pen_model3) 
  • Compare your models parameters compare_parameters() and plot them
Show the code
compare_parameters(pen_model1, pen_model2, pen_model3) |> 
  plot()

  

  • Compare your models compare_models() ::: {.cell}
Show the code
compare_models(pen_model1, pen_model2, pen_model3)

:::

  

  • Compare the performance of your models compare_performance() ::: {.cell}
Show the code
compare_performance(pen_model1, pen_model2, pen_model3)

:::

  

  • Plot a comparison of your models compare_performance() %>% plot() ::: {.cell}
Show the code
compare_performance(pen_model1, pen_model2, pen_model3) %>% plot()

:::

  

  • Make a report of the model you think is the best ::: {.cell}
Show the code
report(pen_model3)

:::



2.3 gtsummary



  • Use tbl_regression() to build a table for one of your models.
  • Style your table ::: {.cell}
Show the code
pen_t1 <- pen_model1 |> 
  tbl_regression() |> 
  bold_labels() |> 
  italicize_levels() |> 
  bold_p()

pen_t1

:::



  • Use tbl_regression() to build a table for two more of your models.
  • Style them the same way as you did before ::: {.cell}
Show the code
pen_t2 <- pen_model2 |> 
  tbl_regression() |> 
  bold_labels() |> 
  italicize_levels() |> 
  bold_p()

pen_t3 <- pen_model3 |> 
  tbl_regression() |> 
  bold_labels() |> 
  italicize_levels() |> 
  bold_p()

:::



  • Use tbl_merge() to merge your tables
  • Add a caption to the merged table ::: {.cell}
Show the code
pen_t1_t2_t3 <- tbl_merge(
  tbls = list(
    pen_t1, pen_t2, pen_t3
  )
) |> 
  modify_caption(
    caption = "**Three regression models modeling the body mass of penguins**"
  ) 

pen_t1_t2_t3

:::



  • save your merged table ::: {.cell}
Show the code
pen_t1_t2_t3 |> 
  as_gt() |> 
  gtsave(here("tables", "my_regression_table.docx"))

:::

3 The PhDPublications dataset

  • Install the package AER and load the package (+add library call to your library code chunk)
  • AER contains >100 different datasets (to see a list: data(package = 'AER')).
  • Load the dataset PhDPublications, by writing data(PhDPublications)
  • Read the documentation for PhDPublications



Using PhDPublications, investigate what influenced the number of articles PhD students published during the last 3 years of their PhD.

  • We want to predict articles what type of data is this?
    • ANSWER R will tell you this is an integer, but if you read the documentaion you will realize that the numbers represent Count or Rate data (Number of articles within a three year period). We typically need to model this type of data using another approach than linear regression. Lets investigate the distribution of articles.
  • Investigate the distribution of the articles variable (see plot below).
    • Does it look normal?
    • Can you recognize the type of distribution?
    • ANSWER This looks like a possion distribution, and you should not model this using normal linear regression with lm(). Instead you should use glm(, family = poisson())
Show the code
PhDPublications %>% 
  ggplot(aes(x = articles))+
  geom_density(bw = 0.5, fill = "lightgrey", size = 1)+
  theme_modern()

:::

  • Build at least two models
  • Consider making the formula for one of the models articles ~ . - What does this do?
  • Store them as (phd_model1, phd_model2, etc.)
  • Remember that we are just practicing coding here. Today is about How do we code it. The purpose of this particular exercise is to show you that you can use the same easystats functions to work with different model types. The output will automagically adjust. This is a big advantage to other packages and programs.
Show the code
phd_model1 <- PhDPublications %>% glm(articles ~ gender, data = ., family = poisson())
phd_model2 <- PhDPublications %>% glm(articles ~ gender * kids + mentor,  data = ., family = poisson())
phd_model3 <- PhDPublications %>% glm(articles ~ ., data = ., family = poisson())



Base R

  • Explore the elements in one of your models using $
    • Type phd_model1$ and explore the drop down menu
    • coefficients(phd_model1)
    • Print a summary() of one of your models



easystats

  

  • Check the assumptions of your models (check_model)
    • Discuss if/why some assumptions are violated
    • What could be done to remedy this? ::: {.cell}
Show the code
check_model(phd_model1)
check_model(phd_model2)
check_model(phd_model3)

:::

  

  • Investigate the model parameters of your models model_parameters() ::: {.cell}
Show the code
model_parameters(phd_model1)
model_parameters(phd_model2)
model_parameters(phd_model3)

:::

  

  • Compare your models compare_models() ::: {.cell}
Show the code
compare_models(phd_model1, phd_model2, phd_model3, 
               exponentiate = TRUE  # To have coefficients as ratios
               )

:::

  

  • Compare the performance of your models compare_performance() ::: {.cell}
Show the code
compare_performance(phd_model1, phd_model2, phd_model3)

:::

  

  • Plot a comparison of your models compare_performance() %>% plot() ::: {.cell}
Show the code
compare_performance(phd_model1, phd_model2, phd_model3) %>% plot()

:::

  

  • Make a report of the model you think is the best ::: {.cell}
Show the code
report(phd_model3, exponentiate = TRUE)

:::



Want More easystats?

For more information on easystats read here.



Improve your home assignment

  • Return to your home assignment and add a regressions analysis if it makes sense
  • Remember to change project (top right corner in Rstudio)
  • Using the menu in the top right corner, you can switch between your course project and your home assignment