Author

Steen Flammild Harsted & Søren O´Neill

Published

April 25, 2025



1 Presentation

You can download the course slides for this section here

Getting Started

  • Make sure that you are working in your course project
  • Create a new quarto document and name it “tables.qmd”
  • Insert a code chunk and load 2 important libraries
  • Insert a new code chunk- Write source(here("scripts", "01_import.R")) in the chunk
  • Write a short headline to each code chunk
  • Change the YAML header to style your document output.
---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format: 
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
    number-sections: true
    number-depth: 2
    code-fold: true
    code-summary: "Show the code"
    code-tools: true
execute:
  message: false
  warning: false
---



2 Tables


Add the gt and gtsummary packages to the code chunk where you have your library calls.

If you need to install gt and/or gtsummary:

  • use install.packages(c("gt", "gtsummary")) to download the packages.
  • This is done in the console and NOT in your script.


2.1 gtsummary


Create a table 1 for the soldiers dataset

  • select sex, heightcm, weightkg, and race of the soldiers
  • use tbl_summary()
Show the code
soldiers |> 
  select(sex, heightcm, weightkg) |> 
  tbl_summary()


In soldiers use tbl_summary() to show the sex, heightcm, weightkg, split by WritingPreference of the soldiers

  • Dont display missing values
  • add_p() (read here and here if you want to change the default tests).

Try the following functions:

  • add_overall()
  • add_stat_label()
  • bold_labels()
  • italicize_levels()
  • What statistical tests are being applied?
Show the code
soldiers |> 
  select(sex, heightcm, weightkg, WritingPreference) |> 
  tbl_summary(
    by = WritingPreference,
    missing = "no"
  ) |> 
  add_p() |> 
  bold_labels() |> 
  italicize_levels() |> 
  add_overall() 


Improve the table further

You probably need to investigate the help file for tbl_summary() to solve these.

  • Change the statistics to mean and sd
  • Change the statistical test of the continous variables from a “Kruskal-Wallis rank sum test” to a One-way ANOVA
  • Find better names for sex, heightcm, and weightkg
  • save the table as a .docx file in your tables folder
Show the code
my_table <- soldiers |> 
  select(sex, heightcm, weightkg, WritingPreference) |> 
  tbl_summary(
    by = WritingPreference,
    missing = "no",
    
    # Change labels
    label = list(
      sex ~ "Sex",
      weightkg ~ "Weight (kg)",
      heightcm ~ "Height (cm)"),
    
    # Change statistics
    statistic = list(all_continuous() ~ "{mean} ({sd})")
    
    
  ) |> 
  
  # t.test
  add_p(
    test = list(all_continuous() ~ "oneway.test",
                all_categorical() ~ "chisq.test.no.correct")
    ) |> 
  bold_labels() |> 
  italicize_levels() |> 
  add_overall() 

my_table


my_table |> 
  as_gt() |> 
  gtsave(filename = here("tables", "my_table.docx"))


2.2 Cross tables


Use tbl_cross() to make a cross table of Component and sex. Click tabs to see code and results

We use tbl_cross() to create a contingency table.

Show the code
soldiers |> 
  tbl_cross(Component, sex) 
sex
Total
Female Male
Component


    Army National Guard 877 1,894 2,771
    Army Reserve 122 102 224
    Regular Army 1,057 2,156 3,213
Total 2,056 4,152 6,208


Add a statistical test

Show the code
soldiers |> 
  tbl_cross(Component, sex) |> 
  add_p()
sex
Total p-value1
Female Male
Component


<0.001
    Army National Guard 877 1,894 2,771
    Army Reserve 122 102 224
    Regular Army 1,057 2,156 3,213
Total 2,056 4,152 6,208
1 Pearson’s Chi-squared test


Use tbl_cross() to make a cross table of race and sex. Click tabs to see code and results

We use tbl_cross() to create a contingency table.

Show the code
soldiers |> 
  tbl_cross(race, sex) 
sex
Total
Female Male
race


    Asian 73 119 192
    Black 684 654 1,338
    Hispanic 247 447 694
    Native American 21 31 52
    Other 0 3 3
    Pacific Islander 25 35 60
    White 1,006 2,863 3,869
Total 2,056 4,152 6,208


This code is going to fail. Run it and read the error message.

soldiers |> 
  tbl_cross(race, sex) |> 
  add_p()

This error occurs because the add_p() function is trying to perform a chi-square test. This statistical test assumes that all cells have an expected count >5. In this contingency table at least one cell has an expected count below 5. Which cell(s) do you think it is?

We change the test to Fishers test, and simulate a p-value

Show the code
soldiers |> 
  tbl_cross(race, sex) |> 
  add_p(
    test = "fisher.test", 
    test.args = list(simulate.p.value=TRUE))
sex
Total p-value1
Female Male
race


<0.001
    Asian 73 119 192
    Black 684 654 1,338
    Hispanic 247 447 694
    Native American 21 31 52
    Other 0 3 3
    Pacific Islander 25 35 60
    White 1,006 2,863 3,869
Total 2,056 4,152 6,208
1 Fisher’s Exact Test for Count Data with simulated p-value (based on 2000 replicates)


Improve your home assignment

  • Add a table 1
  • Add a table 2 and include a statistical test
  • Remember to change project (top right corner in Rstudio)
  • Using the menu in the top right corner, you can switch between your course project and your home assignment



2.3 gt



Explain what the 4 main group of functions in gt are and what they do

  • tab_*()
  • fmt_*()
  • cols_*()
  • cells_*()


Find a dataset and prepare it for a table

Below is a suggestion for soldiers, but you are free to try with you own data if you prefer that.

Using soldiers and gt(), create a table in the following steps:

  • Keep the columns Installation, sex, and all the columns that ends with circumference,
  • Remove Fort Rucker - it only has one soldier
  • Group by Installation and sex
  • summarise the data and calculate the mean and sd of all the columns that ends with circumference
    • you can do this manually (with many lines of code)
    • or you can do this by using the across() function inside summarise(). If you are going to be working with a dataset that has many columns, I suggest you invest some time into learning about across()
  • pipe the summarised table to gt() and set the rowname_col argument to sex
  • add a suitable title and subtitle
  • Assign the table to an object called my_tbl
Show the code
my_tbl <- soldiers |> 
  
  # Select some columns and arrange the tible
  select(Installation, sex, 
         ends_with("circumference")) |> 
  
  # Remove Fort Rucker
  filter(Installation != "Fort Rucker") |> 

  # Remove the Installation with only one Soldier
  #group_by(Installation) |> 
  #add_count() |> 
  #filter(n > 1) |> 
  
  # Summary stats by Installation and Race
  group_by(Installation, sex) |> 
  summarise(
    across(.cols = ends_with("circumference"),
           .fns = list(mean = ~ mean(.x, na.rm = TRUE),
                       sd = ~ sd(.x, na.rm = TRUE)))) |> 
  
  # group only by installation
  # because we want to use sex in the rowname_col argument in gt() - see below
  group_by(Installation) |> 
  
  # Send to gt and perform a few styling functions
  gt(rowname_col = "sex") |> 
  
  tab_header(
    title = md("**Overview of soldiers soldiers by sex and installation**"),
    subtitle = md("*The data is a mock up version of the soldiers dataset*")) |> 
  
  tab_footnote(
    footnote = "To preserve anonymity, observations from Fort Rucker has been removed becuase of a low number of observations"
  )

my_tbl

Style my_tbl as you like

You can some try some of these functions

  • tab_header()

  • tab_source_note()

  • tab_stubhead()

  • tab_spanner()

  • tab_spanner_delim()

  • fmt_number()

  • fmt_percent()

  • fmt_missing()

  • col_merge_n_pct()

  • cols_label()

  • md()

  • cells_body() and tab_footnote()

Show the code
# It is always a choice how much you want to style your in R, and what you leave for manual editing afterwards (e.g. Word)
# You can style everything in R, but it can be code intensive
# In general - you want the basic structure of your table to be in place
# You NEVER want to manually edit values or merge columns.
# Editing column and spanners names are less labour intensive and dont contain the same risk of making errors
# I think the below table is an ok place to stop the styling in R.

my_tbl_styled <- my_tbl |> 
  
  tab_spanner_delim(
    delim = "_",
    columns = everything()
  ) |>
  
  fmt_number(
    columns = contains("circumference"),
    decimals = 1
  ) |> 
  
  cols_merge(
    columns = contains("thigh"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("waist"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("ankle"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("biceps"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("calf"),
    pattern = "{1} ({2})"
    ) 

my_tbl_styled


Save your table using gtsave()

  • Create a folder called “tables” ::: {.cell}
Show the code
gtsave(
  data = my_tbl_styled,
  filename = here("tables", "ANSUR_fort_sex.docx")
)

# Word can also open .rtf  files - its sometimes works better in this format
gtsave(
  data = my_tbl_styled,
  filename = here("tables", "ANSUR_fort_sex.rtf")
)

:::


WANT MORE?


Give the row group labels, heading, and column labels a different background color

  • Use tab_options() ::: {.cell}
Show the code
my_tbl_styled |> 
  tab_options(row_group.background.color = "#C2B7B7") |> 
  tab_options(heading.background.color = "#C2B7B7") |> 
  tab_options(column_labels.background.color = "#C2B7B7")

:::


2.4 gtExtras

Install the package gtExtras and add library(gtExtras) to the codechunk where you load your libraries.


Add a cool theme to your table gt_theme_

Show the code
my_tbl_styled |> 
  gt_theme_espn()


Color thighcircumference_mean with Hulk colors using gt_hulk_col_numeric(thighcircumference_mean)

Show the code
my_tbl_styled |> 
  gt_theme_espn() |> 
  gt_hulk_col_numeric(thighcircumference_mean)