Tables

Author

Steen Flammild Harsted & Søren O´Neill

Published

June 17, 2025

1 Presentation

You can download the course slides for this section here

Getting Started

Make sure that you are working in your course project
Create a new quarto document and name it “tables.qmd”
Insert a code chunk and load 2 important libraries
Insert a new code chunk- Write source(here("scripts", "01_import.R")) in the chunk
Write a short headline to each code chunk
Change the YAML header to style your document output.

The YAML header can look like this

---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format: 
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
    number-sections: true
    number-depth: 2
    code-fold: true
    code-summary: "Show the code"
    code-tools: true
execute:
  message: false
  warning: false
---

2 Tables

Add the `gt` and `gtsummary` packages to the code chunk where you have your library calls.

If you need to install gt and/or gtsummary:

use install.packages(c("gt", "gtsummary")) to download the packages.
This is done in the console and NOT in your script.

2.1 `gtsummary`

Create a table 1 for the `soldiers` dataset

select sex, heightcm, weightkg, and race of the soldiers
use tbl_summary()

Show the code

soldiers |> 
  select(sex, heightcm, weightkg) |> 
  tbl_summary()

In `soldiers` use `tbl_summary()` to show the `sex`, `heightcm`, `weightkg`, split by `WritingPreference` of the soldiers

Dont display missing values
add_p() (read here and here if you want to change the default tests).

Try the following functions:

add_overall()
add_stat_label()
bold_labels()
italicize_levels()
What statistical tests are being applied?

Show the code

soldiers |> 
  select(sex, heightcm, weightkg, WritingPreference) |> 
  tbl_summary(
    by = WritingPreference,
    missing = "no"
  ) |> 
  add_p() |> 
  bold_labels() |> 
  italicize_levels() |> 
  add_overall()

Improve the table further

You probably need to investigate the help file for tbl_summary() to solve these.

Change the statistics to mean and sd
Change the statistical test of the continous variables from a “Kruskal-Wallis rank sum test” to a One-way ANOVA
Find better names for sex, heightcm, and weightkg
save the table as a .docx file in your tables folder

Show the code

my_table <- soldiers |> 
  select(sex, heightcm, weightkg, WritingPreference) |> 
  tbl_summary(
    by = WritingPreference,
    missing = "no",
    
    # Change labels
    label = list(
      sex ~ "Sex",
      weightkg ~ "Weight (kg)",
      heightcm ~ "Height (cm)"),
    
    # Change statistics
    statistic = list(all_continuous() ~ "{mean} ({sd})")
    
    
  ) |> 
  
  # t.test
  add_p(
    test = list(all_continuous() ~ "oneway.test",
                all_categorical() ~ "chisq.test.no.correct")
    ) |> 
  bold_labels() |> 
  italicize_levels() |> 
  add_overall() 

my_table


my_table |> 
  as_gt() |> 
  gtsave(filename = here("tables", "my_table.docx"))

2.2 Cross tables

Use tbl_cross() to make a cross table of Component and sex. Click tabs to see code and results

We use tbl_cross() to create a contingency table.

Show the code

soldiers |> 
  tbl_cross(Component, sex)

	sex		Total
	Female	Male	Total
Component
Army National Guard	877	1,894	2,771
Army Reserve	122	102	224
Regular Army	1,057	2,156	3,213
Total	2,056	4,152	6,208

Add a statistical test

Show the code

soldiers |> 
  tbl_cross(Component, sex) |> 
  add_p()

	sex		Total	p-value¹
	Female	Male	Total	p-value¹
Component				<0.001
Army National Guard	877	1,894	2,771
Army Reserve	122	102	224
Regular Army	1,057	2,156	3,213
Total	2,056	4,152	6,208
¹ Pearson’s Chi-squared test

Use tbl_cross() to make a cross table of race and sex. Click tabs to see code and results

We use tbl_cross() to create a contingency table.

Show the code

soldiers |> 
  tbl_cross(race, sex)

	sex		Total
	Female	Male	Total
race
Asian	73	119	192
Black	684	654	1,338
Hispanic	247	447	694
Native American	21	31	52
Other	0	3	3
Pacific Islander	25	35	60
White	1,006	2,863	3,869
Total	2,056	4,152	6,208

This code is going to fail. Run it and read the error message.

soldiers |> 
  tbl_cross(race, sex) |> 
  add_p()

This error occurs because the add_p() function is trying to perform a chi-square test. This statistical test assumes that all cells have an expected count >5. In this contingency table at least one cell has an expected count below 5. Which cell(s) do you think it is?

We change the test to Fishers test, and simulate a p-value

Show the code

soldiers |> 
  tbl_cross(race, sex) |> 
  add_p(
    test = "fisher.test", 
    test.args = list(simulate.p.value=TRUE))

	sex		Total	p-value¹
	Female	Male	Total	p-value¹
race				<0.001
Asian	73	119	192
Black	684	654	1,338
Hispanic	247	447	694
Native American	21	31	52
Other	0	3	3
Pacific Islander	25	35	60
White	1,006	2,863	3,869
Total	2,056	4,152	6,208
¹ Fisher’s Exact Test for Count Data with simulated p-value (based on 2000 replicates)

Improve your home assignment

Add a table 1
Add a table 2 and include a statistical test
Remember to change project (top right corner in Rstudio)
Using the menu in the top right corner, you can switch between your course project and your home assignment

2.3 `gt`

Explain what the 4 main group of functions in `gt` are and what they do

tab_*()
fmt_*()
cols_*()
cells_*()

Find a dataset and prepare it for a table

Below is a suggestion for soldiers, but you are free to try with you own data if you prefer that.

Using soldiers and gt(), create a table in the following steps:

Keep the columns Installation, sex, and all the columns that ends with circumference,
Remove Fort Rucker - it only has one soldier
Group by Installation and sex
summarise the data and calculate the mean and sd of all the columns that ends with circumference
- you can do this manually (with many lines of code)
- or you can do this by using the across() function inside summarise(). If you are going to be working with a dataset that has many columns, I suggest you invest some time into learning about across()
pipe the summarised table to gt() and set the rowname_col argument to sex
add a suitable title and subtitle
Assign the table to an object called my_tbl

Show the code

my_tbl <- soldiers |> 
  
  # Select some columns and arrange the tible
  select(Installation, sex, 
         ends_with("circumference")) |> 
  
  # Remove Fort Rucker
  filter(Installation != "Fort Rucker") |> 

  # Remove the Installation with only one Soldier
  #group_by(Installation) |> 
  #add_count() |> 
  #filter(n > 1) |> 
  
  # Summary stats by Installation and Race
  group_by(Installation, sex) |> 
  summarise(
    across(.cols = ends_with("circumference"),
           .fns = list(mean = ~ mean(.x, na.rm = TRUE),
                       sd = ~ sd(.x, na.rm = TRUE)))) |> 
  
  # group only by installation
  # because we want to use sex in the rowname_col argument in gt() - see below
  group_by(Installation) |> 
  
  # Send to gt and perform a few styling functions
  gt(rowname_col = "sex") |> 
  
  tab_header(
    title = md("**Overview of soldiers soldiers by sex and installation**"),
    subtitle = md("*The data is a mock up version of the soldiers dataset*")) |> 
  
  tab_footnote(
    footnote = "To preserve anonymity, observations from Fort Rucker has been removed becuase of a low number of observations"
  )

my_tbl

Style `my_tbl` as you like

You can some try some of these functions

tab_header()
tab_source_note()
tab_stubhead()
tab_spanner()
tab_spanner_delim()
fmt_number()
fmt_percent()
fmt_missing()
col_merge_n_pct()
cols_label()
md()
cells_body() and tab_footnote()

Show the code

# It is always a choice how much you want to style your in R, and what you leave for manual editing afterwards (e.g. Word)
# You can style everything in R, but it can be code intensive
# In general - you want the basic structure of your table to be in place
# You NEVER want to manually edit values or merge columns.
# Editing column and spanners names are less labour intensive and dont contain the same risk of making errors
# I think the below table is an ok place to stop the styling in R.

my_tbl_styled <- my_tbl |> 
  
  tab_spanner_delim(
    delim = "_",
    columns = everything()
  ) |>
  
  fmt_number(
    columns = contains("circumference"),
    decimals = 1
  ) |> 
  
  cols_merge(
    columns = contains("thigh"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("waist"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("ankle"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("biceps"),
    pattern = "{1} ({2})"
    ) |> 
  
  cols_merge(
    columns = contains("calf"),
    pattern = "{1} ({2})"
    ) 

my_tbl_styled

Save your table using `gtsave()`

Create a folder called “tables”

Show the code

gtsave(
  data = my_tbl_styled,
  filename = here("tables", "ANSUR_fort_sex.docx")
)

# Word can also open .rtf  files - its sometimes works better in this format
gtsave(
  data = my_tbl_styled,
  filename = here("tables", "ANSUR_fort_sex.rtf")
)

WANT MORE?

Give the row group labels, heading, and column labels a different background color

Use tab_options()

Show the code

my_tbl_styled |> 
  tab_options(row_group.background.color = "#C2B7B7") |> 
  tab_options(heading.background.color = "#C2B7B7") |> 
  tab_options(column_labels.background.color = "#C2B7B7")

2.4 `gtExtras`

Install the package gtExtras and add library(gtExtras) to the codechunk where you load your libraries.

Add a cool theme to your table `gt_theme_`

Show the code

my_tbl_styled |> 
  gt_theme_espn()

Color `thighcircumference_mean` with Hulk colors using `gt_hulk_col_numeric(thighcircumference_mean)`

Show the code

my_tbl_styled |> 
  gt_theme_espn() |> 
  gt_hulk_col_numeric(thighcircumference_mean)

--- title: "Tables" author: "Steen Flammild Harsted & Søren O´Neill" date: today format: html: toc: true toc-depth: 2 number-sections: true number-depth: 2 code-fold: true code-summary: "Show the code" code-tools: true execute: eval: true message: false warning: false --- # Presentation You can download the course slides for this section <a href="./presentation_tables.html" download>here</a> <div> ```{=html} <iframe class="slide-deck" src="presentation_tables.html" width=90% ></iframe> ``` </div> ```{r setup} #| include: false library(tidyverse) library(here) library(gt) library(gtsummary) library(gtExtras) source(here("scripts", "01_import.R")) ``` ```{r} #| eval: true #| column: margin #| echo: false knitr::include_graphics(here::here("img", "sherif.png")) ``` ## Getting Started {.unnumbered} * Make sure that you are working in your course project * Create a new quarto document and name it "tables.qmd" * Insert a code chunk and load 2 important libraries * Insert a new code chunk- Write `source(here("scripts", "01_import.R"))` in the chunk * Write a short headline to each code chunk * Change the YAML header to style your document output. ::: {.callout-tip collapse="true"} ### The YAML header can look like this ````{verbatim} --- title: "TITLE" subtitle: "SUBTITLE" author: "ME" date: today format: html: toc: true toc-depth: 2 embed-resources: true number-sections: true number-depth: 2 code-fold: true code-summary: "Show the code" code-tools: true execute: message: false warning: false --- ```` ::: # Tables #### Add the `gt` and `gtsummary` packages to the code chunk where you have your library calls. If you need to install `gt` and/or `gtsummary`: * use `install.packages(c("gt", "gtsummary"))` to download the packages. * This is done in the console and NOT in your script. ## `gtsummary` #### Create a table 1 for the `soldiers` dataset * select `sex`, `heightcm`, `weightkg`, and `race` of the soldiers * use `tbl_summary()` ```{r} #| output: false soldiers |> select(sex, heightcm, weightkg) |> tbl_summary() ``` #### In `soldiers` use `tbl_summary()` to show the `sex`, `heightcm`, `weightkg`, split by `WritingPreference` of the soldiers * Dont display missing values * `add_p()` (read [here](https://www.danieldsjoberg.com/gtsummary/reference/add_p.tbl_summary.html) and [here](https://www.danieldsjoberg.com/gtsummary/reference/tests.html) if you want to change the default tests). Try the following functions: * `add_overall()` * `add_stat_label()` * `bold_labels()` * `italicize_levels()` * What statistical tests are being applied? ```{r} #| output: false soldiers |> select(sex, heightcm, weightkg, WritingPreference) |> tbl_summary( by = WritingPreference, missing = "no" ) |> add_p() |> bold_labels() |> italicize_levels() |> add_overall() ``` #### Improve the table further You probably need to investigate the help file for `tbl_summary()` to solve these. * Change the statistics to mean and sd * Change the statistical test of the continous variables from a "Kruskal-Wallis rank sum test" to a One-way ANOVA * Find better names for sex, heightcm, and weightkg * save the table as a .docx file in your tables folder ```{r} #| include: false # This chunk checks if we have my.table.docx. # if the file exists, we delete it # This because of a bug where gtsave cannot overwrite # docx files that exist if(file.exists(here("tables", "my_table.docx"))){ file.remove(here("tables", "my_table.docx")) } ``` ```{r} #| output: false my_table <- soldiers |> select(sex, heightcm, weightkg, WritingPreference) |> tbl_summary( by = WritingPreference, missing = "no", # Change labels label = list( sex ~ "Sex", weightkg ~ "Weight (kg)", heightcm ~ "Height (cm)"), # Change statistics statistic = list(all_continuous() ~ "{mean} ({sd})") ) |> # t.test add_p( test = list(all_continuous() ~ "oneway.test", all_categorical() ~ "chisq.test.no.correct") ) |> bold_labels() |> italicize_levels() |> add_overall() my_table my_table |> as_gt() |> gtsave(filename = here("tables", "my_table.docx")) ``` ## Cross tables ::: {.panel-tabset} ## Assignment Use `tbl_cross()` to make a cross table of `Component` and `sex`. Click tabs to see code and results ## cross table We use `tbl_cross()` to create a contingency table. ```{r} soldiers |> tbl_cross(Component, sex) ``` ## `add_p()` Add a statistical test ```{r} soldiers |> tbl_cross(Component, sex) |> add_p() ``` ::: ::: {.panel-tabset} ## Assignment Use `tbl_cross()` to make a cross table of `race` and `sex`. Click tabs to see code and results ## cross table We use `tbl_cross()` to create a contingency table. ```{r} soldiers |> tbl_cross(race, sex) ``` ## `add_p()` This code is going to fail. Run it and read the error message. ```{r} #| eval: false #| code-fold: false soldiers |> tbl_cross(race, sex) |> add_p() ``` ## Why an error? This error occurs because the `add_p()` function is trying to perform a chi-square test. This statistical test assumes that all cells have an expected count >5. In this contingency table at least one cell has an expected count below 5. Which cell(s) do you think it is? ## Solution We change the test to Fishers test, and simulate a p-value ```{r} soldiers |> tbl_cross(race, sex) |> add_p( test = "fisher.test", test.args = list(simulate.p.value=TRUE)) ``` ::: ### Improve your home assignment * Add a table 1 * Add a table 2 and include a statistical test * Remember to change project (top right corner in Rstudio) * Using the menu in the top right corner, you can switch between your course project and your home assignment ## `gt` #### Explain what the 4 main group of functions in `gt` are and what they do * `tab_*()` * `fmt_*()` * `cols_*()` * `cells_*()` #### Find a dataset and prepare it for a table Below is a suggestion for `soldiers`, but you are free to try with you own data if you prefer that. Using `soldiers` and `gt()`, create a table in the following steps: * Keep the columns `Installation`, `sex`, and all the columns that ends with circumference, * Remove `Fort Rucker` - it only has one soldier * Group by `Installation` and `sex` * summarise the data and calculate the mean and sd of all the columns that ends with circumference * you can do this manually (with many lines of code) * or you can do this by using the `across()` function inside `summarise()`. If you are going to be working with a dataset that has many columns, I suggest you invest some time into learning about `across()` * pipe the summarised table to `gt()` and set the `rowname_col` argument to `sex` * add a suitable title and subtitle * Assign the table to an object called `my_tbl` ```{r} #| output: false my_tbl <- soldiers |> # Select some columns and arrange the tible select(Installation, sex, ends_with("circumference")) |> # Remove Fort Rucker filter(Installation != "Fort Rucker") |> # Remove the Installation with only one Soldier #group_by(Installation) |> #add_count() |> #filter(n > 1) |> # Summary stats by Installation and Race group_by(Installation, sex) |> summarise( across(.cols = ends_with("circumference"), .fns = list(mean = ~ mean(.x, na.rm = TRUE), sd = ~ sd(.x, na.rm = TRUE)))) |> # group only by installation # because we want to use sex in the rowname_col argument in gt() - see below group_by(Installation) |> # Send to gt and perform a few styling functions gt(rowname_col = "sex") |> tab_header( title = md("**Overview of soldiers soldiers by sex and installation**"), subtitle = md("*The data is a mock up version of the soldiers dataset*")) |> tab_footnote( footnote = "To preserve anonymity, observations from Fort Rucker has been removed becuase of a low number of observations" ) my_tbl ``` #### Style `my_tbl` as you like You can some try some of these functions * `tab_header()` * `tab_source_note()` * `tab_stubhead()` * `tab_spanner()` * `tab_spanner_delim()` * `fmt_number()` * `fmt_percent()` * `fmt_missing()` * `col_merge_n_pct()` * `cols_label()` * `md()` * `cells_body()` and `tab_footnote()` ```{r} #| output: false # It is always a choice how much you want to style your in R, and what you leave for manual editing afterwards (e.g. Word) # You can style everything in R, but it can be code intensive # In general - you want the basic structure of your table to be in place # You NEVER want to manually edit values or merge columns. # Editing column and spanners names are less labour intensive and dont contain the same risk of making errors # I think the below table is an ok place to stop the styling in R. my_tbl_styled <- my_tbl |> tab_spanner_delim( delim = "_", columns = everything() ) |> fmt_number( columns = contains("circumference"), decimals = 1 ) |> cols_merge( columns = contains("thigh"), pattern = "{1} ({2})" ) |> cols_merge( columns = contains("waist"), pattern = "{1} ({2})" ) |> cols_merge( columns = contains("ankle"), pattern = "{1} ({2})" ) |> cols_merge( columns = contains("biceps"), pattern = "{1} ({2})" ) |> cols_merge( columns = contains("calf"), pattern = "{1} ({2})" ) my_tbl_styled ``` #### Save your table using `gtsave()` * Create a folder called "tables" ```{r} #| eval: false gtsave( data = my_tbl_styled, filename = here("tables", "ANSUR_fort_sex.docx") ) # Word can also open .rtf files - its sometimes works better in this format gtsave( data = my_tbl_styled, filename = here("tables", "ANSUR_fort_sex.rtf") ) ``` ### WANT MORE? {-} #### Give the row group labels, heading, and column labels a different background color * Use `tab_options()` ```{r} #| output: false my_tbl_styled |> tab_options(row_group.background.color = "#C2B7B7") |> tab_options(heading.background.color = "#C2B7B7") |> tab_options(column_labels.background.color = "#C2B7B7") ``` ## `gtExtras` Install the package `gtExtras` and add `library(gtExtras)` to the codechunk where you load your libraries. #### Add a cool theme to your table `gt_theme_` ```{r} #| output: false my_tbl_styled |> gt_theme_espn() ``` #### Color `thighcircumference_mean` with Hulk colors using `gt_hulk_col_numeric(thighcircumference_mean)` ```{r} #| output: false my_tbl_styled |> gt_theme_espn() |> gt_hulk_col_numeric(thighcircumference_mean) ```

1 Presentation

Getting Started

2 Tables

Add the gt and gtsummary packages to the code chunk where you have your library calls.

2.1 gtsummary

Create a table 1 for the soldiers dataset

In soldiers use tbl_summary() to show the sex, heightcm, weightkg, split by WritingPreference of the soldiers

Improve the table further

2.2 Cross tables

Improve your home assignment

2.3 gt

Explain what the 4 main group of functions in gt are and what they do

Find a dataset and prepare it for a table

Style my_tbl as you like

Save your table using gtsave()

WANT MORE?

Give the row group labels, heading, and column labels a different background color

2.4 gtExtras

Add a cool theme to your table gt_theme_

Color thighcircumference_mean with Hulk colors using gt_hulk_col_numeric(thighcircumference_mean)

Add the `gt` and `gtsummary` packages to the code chunk where you have your library calls.

2.1 `gtsummary`

Create a table 1 for the `soldiers` dataset

In `soldiers` use `tbl_summary()` to show the `sex`, `heightcm`, `weightkg`, split by `WritingPreference` of the soldiers

2.3 `gt`

Explain what the 4 main group of functions in `gt` are and what they do

Style `my_tbl` as you like

Save your table using `gtsave()`

2.4 `gtExtras`

Add a cool theme to your table `gt_theme_`

Color `thighcircumference_mean` with Hulk colors using `gt_hulk_col_numeric(thighcircumference_mean)`