Show the code
|>
soldiers select(sex, heightcm, weightkg) |>
tbl_summary()
Steen Flammild Harsted & Søren O´Neill
April 25, 2025
You can download the course slides for this section here
source(here("scripts", "01_import.R"))
in the chunkgt
and gtsummary
packages to the code chunk where you have your library calls.If you need to install gt
and/or gtsummary
:
install.packages(c("gt", "gtsummary"))
to download the packages.gtsummary
soldiers
datasetsex
, heightcm
, weightkg
, and race
of the soldierstbl_summary()
soldiers
use tbl_summary()
to show the sex
, heightcm
, weightkg
, split by WritingPreference
of the soldiersTry the following functions:
add_overall()
add_stat_label()
bold_labels()
italicize_levels()
You probably need to investigate the help file for tbl_summary()
to solve these.
my_table <- soldiers |>
select(sex, heightcm, weightkg, WritingPreference) |>
tbl_summary(
by = WritingPreference,
missing = "no",
# Change labels
label = list(
sex ~ "Sex",
weightkg ~ "Weight (kg)",
heightcm ~ "Height (cm)"),
# Change statistics
statistic = list(all_continuous() ~ "{mean} ({sd})")
) |>
# t.test
add_p(
test = list(all_continuous() ~ "oneway.test",
all_categorical() ~ "chisq.test.no.correct")
) |>
bold_labels() |>
italicize_levels() |>
add_overall()
my_table
my_table |>
as_gt() |>
gtsave(filename = here("tables", "my_table.docx"))
Use tbl_cross()
to make a cross table of Component
and sex
. Click tabs to see code and results
We use tbl_cross()
to create a contingency table.
sex
|
Total | ||
---|---|---|---|
Female | Male | ||
Component | |||
Army National Guard | 877 | 1,894 | 2,771 |
Army Reserve | 122 | 102 | 224 |
Regular Army | 1,057 | 2,156 | 3,213 |
Total | 2,056 | 4,152 | 6,208 |
Use tbl_cross()
to make a cross table of race
and sex
. Click tabs to see code and results
We use tbl_cross()
to create a contingency table.
sex
|
Total | ||
---|---|---|---|
Female | Male | ||
race | |||
Asian | 73 | 119 | 192 |
Black | 684 | 654 | 1,338 |
Hispanic | 247 | 447 | 694 |
Native American | 21 | 31 | 52 |
Other | 0 | 3 | 3 |
Pacific Islander | 25 | 35 | 60 |
White | 1,006 | 2,863 | 3,869 |
Total | 2,056 | 4,152 | 6,208 |
This code is going to fail. Run it and read the error message.
This error occurs because the add_p()
function is trying to perform a chi-square test. This statistical test assumes that all cells have an expected count >5. In this contingency table at least one cell has an expected count below 5. Which cell(s) do you think it is?
We change the test to Fishers test, and simulate a p-value
sex
|
Total | p-value1 | ||
---|---|---|---|---|
Female | Male | |||
race | <0.001 | |||
Asian | 73 | 119 | 192 | |
Black | 684 | 654 | 1,338 | |
Hispanic | 247 | 447 | 694 | |
Native American | 21 | 31 | 52 | |
Other | 0 | 3 | 3 | |
Pacific Islander | 25 | 35 | 60 | |
White | 1,006 | 2,863 | 3,869 | |
Total | 2,056 | 4,152 | 6,208 | |
1 Fisher’s Exact Test for Count Data with simulated p-value (based on 2000 replicates) |
gt
gt
are and what they dotab_*()
fmt_*()
cols_*()
cells_*()
Below is a suggestion for soldiers
, but you are free to try with you own data if you prefer that.
Using soldiers
and gt()
, create a table in the following steps:
Installation
, sex
, and all the columns that ends with circumference,Fort Rucker
- it only has one soldierInstallation
and sex
across()
function inside summarise()
. If you are going to be working with a dataset that has many columns, I suggest you invest some time into learning about across()
gt()
and set the rowname_col
argument to sex
my_tbl
my_tbl <- soldiers |>
# Select some columns and arrange the tible
select(Installation, sex,
ends_with("circumference")) |>
# Remove Fort Rucker
filter(Installation != "Fort Rucker") |>
# Remove the Installation with only one Soldier
#group_by(Installation) |>
#add_count() |>
#filter(n > 1) |>
# Summary stats by Installation and Race
group_by(Installation, sex) |>
summarise(
across(.cols = ends_with("circumference"),
.fns = list(mean = ~ mean(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE)))) |>
# group only by installation
# because we want to use sex in the rowname_col argument in gt() - see below
group_by(Installation) |>
# Send to gt and perform a few styling functions
gt(rowname_col = "sex") |>
tab_header(
title = md("**Overview of soldiers soldiers by sex and installation**"),
subtitle = md("*The data is a mock up version of the soldiers dataset*")) |>
tab_footnote(
footnote = "To preserve anonymity, observations from Fort Rucker has been removed becuase of a low number of observations"
)
my_tbl
my_tbl
as you likeYou can some try some of these functions
tab_header()
tab_source_note()
tab_stubhead()
tab_spanner()
tab_spanner_delim()
fmt_number()
fmt_percent()
fmt_missing()
col_merge_n_pct()
cols_label()
md()
cells_body()
and tab_footnote()
# It is always a choice how much you want to style your in R, and what you leave for manual editing afterwards (e.g. Word)
# You can style everything in R, but it can be code intensive
# In general - you want the basic structure of your table to be in place
# You NEVER want to manually edit values or merge columns.
# Editing column and spanners names are less labour intensive and dont contain the same risk of making errors
# I think the below table is an ok place to stop the styling in R.
my_tbl_styled <- my_tbl |>
tab_spanner_delim(
delim = "_",
columns = everything()
) |>
fmt_number(
columns = contains("circumference"),
decimals = 1
) |>
cols_merge(
columns = contains("thigh"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("waist"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("ankle"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("biceps"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("calf"),
pattern = "{1} ({2})"
)
my_tbl_styled
gtsave()
:::
tab_options()
::: {.cell}:::
gtExtras
Install the package gtExtras
and add library(gtExtras)
to the codechunk where you load your libraries.
gt_theme_
thighcircumference_mean
with Hulk colors using gt_hulk_col_numeric(thighcircumference_mean)
---
title: "Tables"
author: "Steen Flammild Harsted & Søren O´Neill"
date: today
format:
html:
toc: true
toc-depth: 2
number-sections: true
number-depth: 2
code-fold: true
code-summary: "Show the code"
code-tools: true
execute:
eval: true
message: false
warning: false
---
<br><br>
# Presentation
You can download the course slides for this section <a href="./presentation_tables.html" download>here</a>
<div>
```{=html}
<iframe class="slide-deck" src="presentation_tables.html" width=90% ></iframe>
```
</div>
```{r setup}
#| include: false
library(tidyverse)
library(here)
library(gt)
library(gtsummary)
library(gtExtras)
source(here("scripts", "01_import.R"))
```
```{r}
#| eval: true
#| column: margin
#| echo: false
knitr::include_graphics(here::here("img", "sherif.png"))
```
## Getting Started {.unnumbered}
* Make sure that you are working in your course project
* Create a new quarto document and name it "tables.qmd"
* Insert a code chunk and load 2 important libraries
* Insert a new code chunk- Write `source(here("scripts", "01_import.R"))` in the chunk
* Write a short headline to each code chunk
* Change the YAML header to style your document output.
::: {.callout-tip collapse="true"}
### The YAML header can look like this
````{verbatim}
---
title: "TITLE"
subtitle: "SUBTITLE"
author: "ME"
date: today
format:
html:
toc: true
toc-depth: 2
embed-resources: true
number-sections: true
number-depth: 2
code-fold: true
code-summary: "Show the code"
code-tools: true
execute:
message: false
warning: false
---
````
:::
<br><br>
# Tables
<br>
#### Add the `gt` and `gtsummary` packages to the code chunk where you have your library calls.
If you need to install `gt` and/or `gtsummary`:
* use `install.packages(c("gt", "gtsummary"))` to download the packages.
* This is done in the console and NOT in your script.
<br>
## `gtsummary`
<br>
#### Create a table 1 for the `soldiers` dataset
* select `sex`, `heightcm`, `weightkg`, and `race` of the soldiers
* use `tbl_summary()`
```{r}
#| output: false
soldiers |>
select(sex, heightcm, weightkg) |>
tbl_summary()
```
<br>
#### In `soldiers` use `tbl_summary()` to show the `sex`, `heightcm`, `weightkg`, split by `WritingPreference` of the soldiers
* Dont display missing values
* `add_p()` (read [here](https://www.danieldsjoberg.com/gtsummary/reference/add_p.tbl_summary.html) and [here](https://www.danieldsjoberg.com/gtsummary/reference/tests.html) if you want to change the default tests).
Try the following functions:
* `add_overall()`
* `add_stat_label()`
* `bold_labels()`
* `italicize_levels()`
* What statistical tests are being applied?
```{r}
#| output: false
soldiers |>
select(sex, heightcm, weightkg, WritingPreference) |>
tbl_summary(
by = WritingPreference,
missing = "no"
) |>
add_p() |>
bold_labels() |>
italicize_levels() |>
add_overall()
```
<br>
#### Improve the table further
You probably need to investigate the help file for `tbl_summary()` to solve these.
* Change the statistics to mean and sd
* Change the statistical test of the continous variables from a "Kruskal-Wallis rank sum test" to a One-way ANOVA
* Find better names for sex, heightcm, and weightkg
* save the table as a .docx file in your tables folder
```{r}
#| include: false
# This chunk checks if we have my.table.docx.
# if the file exists, we delete it
# This because of a bug where gtsave cannot overwrite
# docx files that exist
if(file.exists(here("tables", "my_table.docx"))){
file.remove(here("tables", "my_table.docx"))
}
```
```{r}
#| output: false
my_table <- soldiers |>
select(sex, heightcm, weightkg, WritingPreference) |>
tbl_summary(
by = WritingPreference,
missing = "no",
# Change labels
label = list(
sex ~ "Sex",
weightkg ~ "Weight (kg)",
heightcm ~ "Height (cm)"),
# Change statistics
statistic = list(all_continuous() ~ "{mean} ({sd})")
) |>
# t.test
add_p(
test = list(all_continuous() ~ "oneway.test",
all_categorical() ~ "chisq.test.no.correct")
) |>
bold_labels() |>
italicize_levels() |>
add_overall()
my_table
my_table |>
as_gt() |>
gtsave(filename = here("tables", "my_table.docx"))
```
<br>
## Cross tables
<br>
::: {.panel-tabset}
## Assignment
Use `tbl_cross()` to make a cross table of `Component` and `sex`.
Click tabs to see code and results
## cross table
We use `tbl_cross()` to create a contingency table.
```{r}
soldiers |>
tbl_cross(Component, sex)
```
<br>
## `add_p()`
Add a statistical test
```{r}
soldiers |>
tbl_cross(Component, sex) |>
add_p()
```
:::
<br>
::: {.panel-tabset}
## Assignment
Use `tbl_cross()` to make a cross table of `race` and `sex`.
Click tabs to see code and results
## cross table
We use `tbl_cross()` to create a contingency table.
```{r}
soldiers |>
tbl_cross(race, sex)
```
<br>
## `add_p()`
This code is going to fail. Run it and read the error message.
```{r}
#| eval: false
#| code-fold: false
soldiers |>
tbl_cross(race, sex) |>
add_p()
```
## Why an error?
This error occurs because the `add_p()` function is trying to perform a chi-square test. This statistical test assumes that all cells have an expected count >5. In this contingency table at least one cell has an expected count below 5. Which cell(s) do you think it is?
## Solution
We change the test to Fishers test, and simulate a p-value
```{r}
soldiers |>
tbl_cross(race, sex) |>
add_p(
test = "fisher.test",
test.args = list(simulate.p.value=TRUE))
```
:::
<br>
### Improve your home assignment
* Add a table 1
* Add a table 2 and include a statistical test
* Remember to change project (top right corner in Rstudio)
* Using the menu in the top right corner, you can switch between your course project and your home assignment
<br><br>
## `gt`
<br><br>
#### Explain what the 4 main group of functions in `gt` are and what they do
* `tab_*()`
* `fmt_*()`
* `cols_*()`
* `cells_*()`
<br>
#### Find a dataset and prepare it for a table
Below is a suggestion for `soldiers`, but you are free to try with you own data if you prefer that.
Using `soldiers` and `gt()`, create a table in the following steps:
* Keep the columns `Installation`, `sex`, and all the columns that ends with circumference,
* Remove `Fort Rucker` - it only has one soldier
* Group by `Installation` and `sex`
* summarise the data and calculate the mean and sd of all the columns that ends with circumference
* you can do this manually (with many lines of code)
* or you can do this by using the `across()` function inside `summarise()`. If you are going to be working with a dataset that has many columns, I suggest you invest some time into learning about `across()`
* pipe the summarised table to `gt()` and set the `rowname_col` argument to `sex`
* add a suitable title and subtitle
* Assign the table to an object called `my_tbl`
```{r}
#| output: false
my_tbl <- soldiers |>
# Select some columns and arrange the tible
select(Installation, sex,
ends_with("circumference")) |>
# Remove Fort Rucker
filter(Installation != "Fort Rucker") |>
# Remove the Installation with only one Soldier
#group_by(Installation) |>
#add_count() |>
#filter(n > 1) |>
# Summary stats by Installation and Race
group_by(Installation, sex) |>
summarise(
across(.cols = ends_with("circumference"),
.fns = list(mean = ~ mean(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE)))) |>
# group only by installation
# because we want to use sex in the rowname_col argument in gt() - see below
group_by(Installation) |>
# Send to gt and perform a few styling functions
gt(rowname_col = "sex") |>
tab_header(
title = md("**Overview of soldiers soldiers by sex and installation**"),
subtitle = md("*The data is a mock up version of the soldiers dataset*")) |>
tab_footnote(
footnote = "To preserve anonymity, observations from Fort Rucker has been removed becuase of a low number of observations"
)
my_tbl
```
#### Style `my_tbl` as you like
You can some try some of these functions
* `tab_header()`
* `tab_source_note()`
* `tab_stubhead()`
* `tab_spanner()`
* `tab_spanner_delim()`
* `fmt_number()`
* `fmt_percent()`
* `fmt_missing()`
* `col_merge_n_pct()`
* `cols_label()`
* `md()`
* `cells_body()` and `tab_footnote()`
```{r}
#| output: false
# It is always a choice how much you want to style your in R, and what you leave for manual editing afterwards (e.g. Word)
# You can style everything in R, but it can be code intensive
# In general - you want the basic structure of your table to be in place
# You NEVER want to manually edit values or merge columns.
# Editing column and spanners names are less labour intensive and dont contain the same risk of making errors
# I think the below table is an ok place to stop the styling in R.
my_tbl_styled <- my_tbl |>
tab_spanner_delim(
delim = "_",
columns = everything()
) |>
fmt_number(
columns = contains("circumference"),
decimals = 1
) |>
cols_merge(
columns = contains("thigh"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("waist"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("ankle"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("biceps"),
pattern = "{1} ({2})"
) |>
cols_merge(
columns = contains("calf"),
pattern = "{1} ({2})"
)
my_tbl_styled
```
<br>
#### Save your table using `gtsave()`
* Create a folder called "tables"
```{r}
#| eval: false
gtsave(
data = my_tbl_styled,
filename = here("tables", "ANSUR_fort_sex.docx")
)
# Word can also open .rtf files - its sometimes works better in this format
gtsave(
data = my_tbl_styled,
filename = here("tables", "ANSUR_fort_sex.rtf")
)
```
<br>
### WANT MORE? {-}
<br>
#### Give the row group labels, heading, and column labels a different background color
* Use `tab_options()`
```{r}
#| output: false
my_tbl_styled |>
tab_options(row_group.background.color = "#C2B7B7") |>
tab_options(heading.background.color = "#C2B7B7") |>
tab_options(column_labels.background.color = "#C2B7B7")
```
<br>
## `gtExtras`
Install the package `gtExtras` and add `library(gtExtras)` to the codechunk where you load your libraries.
<br>
#### Add a cool theme to your table `gt_theme_`
```{r}
#| output: false
my_tbl_styled |>
gt_theme_espn()
```
<br>
#### Color `thighcircumference_mean` with Hulk colors using `gt_hulk_col_numeric(thighcircumference_mean)`
```{r}
#| output: false
my_tbl_styled |>
gt_theme_espn() |>
gt_hulk_col_numeric(thighcircumference_mean)
```