Tables

Steen Flammild Harsted & Søren O’Neill

The Workflow

The Workflow

Posits (Rstudios) yearly table contest -

2021 winners
2022 winners
2024 winners

gtsummary

The gtsummary package provides an elegant and flexible way to create publication-ready analytical and summary tables

gt

gt and gtsummary

gtsummary

# make dataset with a few 
# variables to summarize

trial2 <- trial |>
  select(age, grade, response, trt)

# summarize the data with gtsummary
tbl_summary(trial2)
Characteristic N = 2001
Age 47 (38, 57)
    Unknown 11
Grade
    I 68 (34%)
    II 68 (34%)
    III 64 (32%)
Tumor Response 61 (32%)
    Unknown 7
Chemotherapy Treatment
    Drug A 98 (49%)
    Drug B 102 (51%)
1 Median (Q1, Q3); n (%)

gtsummary

my_trial_table <- tbl_summary(
  trial2,
  by = trt,
  missing = "no" # don't show NA
  )  

my_trial_table
Characteristic Drug A
N = 981
Drug B
N = 1021
Age 46 (37, 60) 48 (39, 56)
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%)
1 Median (Q1, Q3); n (%)

gtsummary

my_trial_table <- my_trial_table |>
  
  # add nr of non-missing 
  add_n() |> 
  
  # test for a dif. between groups
  add_p() |>
  
  # update the column header
  modify_header(label = "**Variable**") |> 
  bold_labels()  
  


my_trial_table
Variable N Drug A
N = 981
Drug B
N = 1021
p-value2
Age 189 46 (37, 60) 48 (39, 56) 0.7
Grade 200

0.9
    I
35 (36%) 33 (32%)
    II
32 (33%) 36 (35%)
    III
31 (32%) 33 (32%)
Tumor Response 193 28 (29%) 33 (34%) 0.5
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

gtsummary

To save a gtsummary table we:

  1. Change it to a gt table using as_gt()
  2. Save it using the gtsave() function
my_trial_table  |> 
  
  # Save the table
  as_gt() |> 
  gtsave(filename = here("tables", "my_trial_table.docx"))

gtsummary

Changing descriptive statistics

tbl_summary(
  trial2,
  by = trt,
  missing = "no", # don't show NA,
  
  statistic = list(all_continuous() ~ "{mean} ({sd})")
  )  
Characteristic Drug A
N = 981
Drug B
N = 1021
Age 47 (15) 47 (14)
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%)
1 Mean (SD); n (%)

gtsummary

add test

tbl_summary(
  trial2,
  by = trt,
  missing = "no", # don't show NA,
  statistic = list(all_continuous() ~ "{mean} ({sd})")
  )  |> 
  add_p()
Characteristic Drug A
N = 981
Drug B
N = 1021
p-value2
Age 47 (15) 47 (14) 0.7
Grade

0.9
    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%) 0.5
1 Mean (SD); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

gtsummary

Changing test

tbl_summary(
  trial2,
  by = trt,
  missing = "no", # don't show NA,
  statistic = list(all_continuous() ~ "{mean} ({sd})")
  )  |> 
  add_p(
    test = list(all_continuous() ~ "t.test")
  )
Characteristic Drug A
N = 981
Drug B
N = 1021
p-value2
Age 47 (15) 47 (14) 0.8
Grade

0.9
    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%) 0.5
1 Mean (SD); n (%)
2 Welch Two Sample t-test; Pearson’s Chi-squared test

See test names here

gtsummary

Changing test

tbl_summary(
  trial2,
  by = trt,
  missing = "no", # don't show NA,
  statistic = list(all_continuous() ~ "{mean} ({sd})")
  )  |> 
  add_p(
    test = list("response" ~ "t.test",
                "age" ~ "wilcox.test")
  )
Characteristic Drug A
N = 981
Drug B
N = 1021
p-value2
Age 47 (15) 47 (14) 0.7
Grade

0.9
    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%) 0.5
1 Mean (SD); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test; Welch Two Sample t-test

See test names here

gtsummary tbl_cross()

Cross table

trial2 |> 
  tbl_cross(grade, trt)
Chemotherapy Treatment
Total
Drug A Drug B
Grade


    I 35 33 68
    II 32 36 68
    III 31 33 64
Total 98 102 200

See test names here

gtsummary tbl_cross()

Cross table

trial2 |> 
  tbl_cross(grade, trt) |> 
  add_p()
Chemotherapy Treatment
Total p-value1
Drug A Drug B
Grade


0.9
    I 35 33 68
    II 32 36 68
    III 31 33 64
Total 98 102 200
1 Pearson’s Chi-squared test

See test names here

Lets practice using gtsummary()

gt

gt A grammar of Tables

gt_parts_of_a_table_svg Created with Sketch. SVG Version

gt

- Table data is modified with dplyr, tidyr, etc.

- The gt_object is modified using functions from the gt package

- Default output is HTML

- Output can be changed to LaTeX, RTF, and Word format

gt_workflow_diagram output as HTML

Step 1 - create a tibble that you want to transform to a table

T1 <- diamonds |> 
  group_by(cut, color) |>  
  summarise(price_median = median(price),
            price_iqr   = IQR(price),
            mm3_mean = mean(x*y*z),
            mm3_sd   = sd(x*y*z),
            n = n()) |> 
  mutate(n_freq = n / sum(n)) |> 
  ungroup() |> 
  mutate(n_freq_tot = n / sum(n)) |> 
  arrange(cut, color) 

Step 1 - create a tibble that you want to transform to a table

For better overview on the slides:

cut %in% c("Fair", "Ideal") & color %in% c("D", "G")

T1 <- diamonds |> 
  group_by(cut, color) |>  
  summarise(price_median = median(price),
            price_iqr   = IQR(price),
            mm3_mean = mean(x*y*z),
            mm3_sd   = sd(x*y*z),
            n = n()) |> 
  mutate(n_freq = n / sum(n)) |> 
  ungroup() |> 
  mutate(n_freq_tot = n / sum(n)) |> 
  arrange(cut, color) |> 
  filter(cut %in% c("Fair", "Ideal") &
         color %in% c("D", "G"))

T1
# A tibble: 4 × 9
  cut   color price_median price_iqr mm3_mean mm3_sd     n n_freq n_freq_tot
  <ord> <ord>        <dbl>     <dbl>    <dbl>  <dbl> <int>  <dbl>      <dbl>
1 Fair  D            3730      2592.    146.    64.3   163  0.101    0.00302
2 Fair  G            3057      3086     161.    77.1   314  0.195    0.00582
3 Ideal D            1576      2248.     93.1   48.9  2834  0.132    0.0525 
4 Ideal G            1858.     4694.    115.    67.0  4884  0.227    0.0905 

Step 2 - pipe the tibble to gt()

Notice the effect of group_by()

T1 |> 
  gt() 
cut color price_median price_iqr mm3_mean mm3_sd n n_freq n_freq_tot
Fair D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
Fair G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
Ideal G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050

  

T1 |> 
  group_by(cut) |> 
  gt()
color price_median price_iqr mm3_mean mm3_sd n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050

Step 2 - pipe the tibble to gt()

T1 |> 
  group_by(cut) |> 
  gt() 
color price_median price_iqr mm3_mean mm3_sd n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050
T1 |> 
  group_by(cut) |> 
  gt(rowname_col = "color")
price_median price_iqr mm3_mean mm3_sd n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050

Step 3,4,5,…. Transform the gt object

The main group of functions

 

1. tab_ Creates or modifies parts of the overall table structure


2. fmt_ Formats the data in the table

 

3. col_ Modifies the Columns

 

4. cells_ Location helpers for targeting specific cells

 

5. gtsave and as_ Export functions

tab_ functions


tab_header() Add a table header

tab_spanner() Add a spanner column label

tab_spanner_delim() Create column labels and spanners via delimited names

tab_row_group() Add a row group to a gt table

tab_stubhead() Add label text to the stubhead

tab_footnote() Add a table footnote

tab_source_note() Add a source note citation

tab_style() Add custom styles to one or more cells

tab_options() Modify the table output option

tab_header and tab_source_note()

T2 <- T1 |> 
  group_by(cut) |> 
  gt(rowname_col = "color") |> 
  
  tab_header(
    title = "Price and mm3 of diamonds",
    subtitle = "Only displaying ´Fair´ and ´Ideal` cuts") |> 
  
  tab_source_note(
    source_note = "Data is from the Diamonds dataset included in the ggplot2 package") |> 
  
  tab_stubhead(
    label = "Cut Color"
  )

T2
Price and mm3 of diamonds
Only displaying ´Fair´ and ´Ideal` cuts
Cut Color price_median price_iqr mm3_mean mm3_sd n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050
Data is from the Diamonds dataset included in the ggplot2 package

tab_spanner()

T2 <- T2 |> 
  
  tab_spanner(
    label = "Counts and frequencies",
    columns = 7:9)

T2
Price and mm3 of diamonds
Only displaying ´Fair´ and ´Ideal` cuts
Cut Color price_median price_iqr mm3_mean mm3_sd
Counts and frequencies
n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050
Data is from the Diamonds dataset included in the ggplot2 package

tab_spanner_delim()

T2 <- T2 |> 
  
  tab_spanner_delim(
    delim = "_",
    columns = 3:6)

T2
Price and mm3 of diamonds
Only displaying ´Fair´ and ´Ideal` cuts
Cut Color
price
mm3
Counts and frequencies
median iqr mean sd n n_freq n_freq_tot
Fair
D 3730.0 2592.50 145.60641 64.32012 163 0.1012422 0.003021876
G 3057.0 3086.00 160.87748 77.09566 314 0.1950311 0.005821283
Ideal
D 1576.0 2247.75 93.05613 48.92116 2834 0.1315020 0.052539859
G 1857.5 4693.50 115.04085 66.97330 4884 0.2266252 0.090545050
Data is from the Diamonds dataset included in the ggplot2 package

fmt_ functions

 

fmt_number() Format numeric values

fmt_integer() Format values as integers

fmt_scientific() Format values to scientific notation

fmt_engineering() Format values to engineering notation

fmt_percent() Format values as a percentage

fmt_currency() Format values as currencies

fmt_bytes() Format values as bytes

fmt_date() Format values as dates

fmt_time() Format values as times

fmt_datetime() Format values as date-times

fmt_markdown() Format Markdown text

fmt_passthrough() Format by simply passing data through

fmt_missing() Format missing values

fmt_currency(), fmt_number(), and fmt_percent()

T3 <- T2 |> 
  
  fmt_currency(
    columns = 3:4,
    currency = "USD",
    decimals = 0) |> 
  
  fmt_number(
    columns = 5:6,
    decimals = 1) |> 
  
  fmt_percent(
    columns = c(n_freq, n_freq_tot),
    decimals = 1) 

T3
Price and mm3 of diamonds
Only displaying ´Fair´ and ´Ideal` cuts
Cut Color
price
mm3
Counts and frequencies
median iqr mean sd n n_freq n_freq_tot
Fair
D $3,730 $2,592 145.6 64.3 163 10.1% 0.3%
G $3,057 $3,086 160.9 77.1 314 19.5% 0.6%
Ideal
D $1,576 $2,248 93.1 48.9 2834 13.2% 5.3%
G $1,858 $4,694 115.0 67.0 4884 22.7% 9.1%
Data is from the Diamonds dataset included in the ggplot2 package

col_ functions

The cols_*() functions allow for modifications that act on entire columns.

cols_align() Set the alignment of columns

cols_width() Set the widths of columns

cols_label() Relabel one or more columns

cols_move_to_start() Move one or more columns to the start

cols_move_to_end() Move one or more columns to the end

cols_move() Move one or more columns

cols_hide() Hide one or more columns

cols_unhide() Unhide one or more columns

cols_merge_range() Merge two columns to a value range column

cols_merge_uncert() Merge two columns to a value & uncertainty column

cols_merge_n_pct() Merge two columns to combine counts and percentages

cols_merge() Merge data from two or more columns to a single column

col_merge_n_pct() and cols_label()

T4 <- T3 |> 
  
  cols_merge_n_pct(
    col_n = n,
    col_pct = n_freq) |> 
  
  cols_label(
    n = "n (% of group)") |> 
  
  cols_label(
    n_freq_tot = "% of total")

T4
Price and mm3 of diamonds
Only displaying ´Fair´ and ´Ideal` cuts
Cut Color
price
mm3
Counts and frequencies
median iqr mean sd n (% of group) % of total
Fair
D $3,730 $2,592 145.6 64.3 163 (10.1%) 0.3%
G $3,057 $3,086 160.9 77.1 314 (19.5%) 0.6%
Ideal
D $1,576 $2,248 93.1 48.9 2834 (13.2%) 5.3%
G $1,858 $4,694 115.0 67.0 4884 (22.7%) 9.1%
Data is from the Diamonds dataset included in the ggplot2 package

cells_ and other helper functions

 

md() Interpret input text as Markdown-formatted text

 

The various cells_*() functions are used for targeting cells with the locations = argument in the tab_footnote(), tab_style(), and text_transform() functions.

 

cells_title() Location helper for targeting the table title and subtitle

cells_stubhead() Location helper for targeting the table stubhead cell

cells_column_spanners() Location helper for targeting the column spanners

cells_column_labels() Location helper for targeting the column labels

cells_row_groups() Location helper for targeting row groups

cells_stub() Location helper for targeting cells in the table stub

cells_body() Location helper for targeting data cells in the table body

cells_summary() Location helper for targeting group summary cells

cells_grand_summary() Location helper for targeting cells in a grand summary

cells_stub_summary() Location helper for targeting the stub cells in a summary

cells_stub_grand_summary() Location helper for targeting the stub cells in a grand summary

cells_footnotes() Location helper for targeting the footnotes

cells_source_notes() Location helper for targeting the source notes

md()

T5 <- T4 |> 
  
  tab_header(
    title = md("**Price and mm<sup>3</sup> of diamonds**"),
    subtitle = md("Only displaying *Fair* and *Ideal* cuts"))

T5
Price and mm3 of diamonds
Only displaying Fair and Ideal cuts
Cut Color
price
mm3
Counts and frequencies
median iqr mean sd n (% of group) % of total
Fair
D $3,730 $2,592 145.6 64.3 163 (10.1%) 0.3%
G $3,057 $3,086 160.9 77.1 314 (19.5%) 0.6%
Ideal
D $1,576 $2,248 93.1 48.9 2834 (13.2%) 5.3%
G $1,858 $4,694 115.0 67.0 4884 (22.7%) 9.1%
Data is from the Diamonds dataset included in the ggplot2 package

tab_footnote() and cells_body()

T6 <- T5 |> 
  
  tab_footnote(
    footnote = md("**OMG** this is a ***CHEAP*** *median price*!!!!"),
    locations = cells_body(
      columns = price_median,
      rows = cut == "Ideal" & price_median < 3000))

T6
Price and mm3 of diamonds
Only displaying Fair and Ideal cuts
Cut Color
price
mm3
Counts and frequencies
median iqr mean sd n (% of group) % of total
Fair
D $3,730 $2,592 145.6 64.3 163 (10.1%) 0.3%
G $3,057 $3,086 160.9 77.1 314 (19.5%) 0.6%
Ideal
D 1 $1,576 $2,248 93.1 48.9 2834 (13.2%) 5.3%
G 1 $1,858 $4,694 115.0 67.0 4884 (22.7%) 9.1%
Data is from the Diamonds dataset included in the ggplot2 package
1 OMG this is a CHEAP median price!!!!

tab_footnote() and cells_column_labels()

T7 <- T6 |> 
  
  tab_footnote(
    footnote = md("*Interquartile range*"),
    locations = cells_column_labels(
      columns = price_iqr))

T7
Price and mm3 of diamonds
Only displaying Fair and Ideal cuts
Cut Color
price
mm3
Counts and frequencies
median iqr1 mean sd n (% of group) % of total
Fair
D $3,730 $2,592 145.6 64.3 163 (10.1%) 0.3%
G $3,057 $3,086 160.9 77.1 314 (19.5%) 0.6%
Ideal
D 2 $1,576 $2,248 93.1 48.9 2834 (13.2%) 5.3%
G 2 $1,858 $4,694 115.0 67.0 4884 (22.7%) 9.1%
Data is from the Diamonds dataset included in the ggplot2 package
1 Interquartile range
2 OMG this is a CHEAP median price!!!!

gtExtras themes

The package gtExtras comes with 8 different themes (HTML only)

T7 |> 
  gt_theme_538()
Price and mm3 of diamonds
Only displaying Fair and Ideal cuts
Cut Color
price
mm3
Counts and frequencies
median iqr1 mean sd n (% of group) % of total
Fair
D $3,730 $2,592 145.6 64.3 163 (10.1%) 0.3%
G $3,057 $3,086 160.9 77.1 314 (19.5%) 0.6%
Ideal
D 2 $1,576 $2,248 93.1 48.9 2834 (13.2%) 5.3%
G 2 $1,858 $4,694 115.0 67.0 4884 (22.7%) 9.1%
Data is from the Diamonds dataset included in the ggplot2 package
1 Interquartile range
2 OMG this is a CHEAP median price!!!!

gtsave

HTML

gtsave(T6, 
       filename = here("tables", "table.HTML")) # HTML

Word

gtsave(T6, 
       filename = here("tables", "table.docx"))  # Word

PDF

# PDF via webshot() - read documentation [zoom = , expand = 
gtsave(T6, 
       filename = here("tables", "table.pdf"))  

As image

# Save the table as an image via webshot()
gtsave(T6, 
       filename = here("tables", "table.png"))  

Thanks!