RMarkdown Tables - Data Summary and Presentation

Why you want Tables

While you can create simple tables with the table() function in base R, most of the time you will want to present your results in some kind of table format. This could be for any of the following:

viewing your data in a table format
presenting summary statistics of the variables in your dataset
presenting your models or analysis results in a table format
and even more…

Get Inspiration

The underlying formatting for making appealing and well organized tables can be sort of an art-form. Getting the code to work along with the formatting for various final formats (like HTML, PDF, DOC, PPT, etc) can be extremely challenging. However, the good new is that this has recently been a hot area of rapid development in the R/RMarkdown world.

In fact, in the past few years there have been contests on the best tables and associated packages and codes for these projects. See:

Let’s try a simple table to get started

Here is an example of basic output to view the “top” of the builtin mtcars dataset, using this code: head(mtcars) .

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

OK, so this is just text on the page - not really a nice table.

To make this a table, let’s use the kable() function from the knitr package. To set this up, we’ll also use the dplyr package to use the %>% pipe coding approach.

library(knitr)
library(dplyr)
mtcars %>%
  head() %>%
  knitr::kable()

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Let’s add a caption for our table.

NOTE: The way the caption shows up will vary depending on whether you “knit” to HTML, DOCX, PDF or other formats…

mtcars %>%
  head() %>%
  knitr::kable(caption = "Top 6 rows of the mtcars dataset")

Top 6 rows of the mtcars dataset
	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Try customization with the `gt` package

You can add headers, footers and more with the gt package. See https://gt.rstudio.com/index.html.

library(gt)
mtcars %>%
  head() %>%
  gt()

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Add a header.

mtcars %>%
  head() %>%
  gt() %>%
  tab_header(
    title = "The mtcars dataset",
    subtitle = "The top 6 rows are presented"
  )

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
The mtcars dataset
The top 6 rows are presented
21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Add a footer.

mtcars %>%
  head() %>%
  gt() %>%
  tab_header(
    title = "The mtcars dataset",
    subtitle = "The top 6 rows are presented"
  ) %>%
  tab_source_note(
    source_note = "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)."
  )

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
The mtcars dataset
The top 6 rows are presented
21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225	105	2.76	3.460	20.22	1	0	3	1
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

What about summary statistics?

A really simple approach is to use the summary() function in case R. But the results, while useful, is less than inspiring.

mtcars %>%
  summary() %>%
  knitr::kable()

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Min. :10.40	Min. :4.000	Min. : 71.1	Min. : 52.0	Min. :2.760	Min. :1.513	Min. :14.50	Min. :0.0000	Min. :0.0000	Min. :3.000	Min. :1.000
1st Qu.:15.43	1st Qu.:4.000	1st Qu.:120.8	1st Qu.: 96.5	1st Qu.:3.080	1st Qu.:2.581	1st Qu.:16.89	1st Qu.:0.0000	1st Qu.:0.0000	1st Qu.:3.000	1st Qu.:2.000
Median :19.20	Median :6.000	Median :196.3	Median :123.0	Median :3.695	Median :3.325	Median :17.71	Median :0.0000	Median :0.0000	Median :4.000	Median :2.000
Mean :20.09	Mean :6.188	Mean :230.7	Mean :146.7	Mean :3.597	Mean :3.217	Mean :17.85	Mean :0.4375	Mean :0.4062	Mean :3.688	Mean :2.812
3rd Qu.:22.80	3rd Qu.:8.000	3rd Qu.:326.0	3rd Qu.:180.0	3rd Qu.:3.920	3rd Qu.:3.610	3rd Qu.:18.90	3rd Qu.:1.0000	3rd Qu.:1.0000	3rd Qu.:4.000	3rd Qu.:4.000
Max. :33.90	Max. :8.000	Max. :472.0	Max. :335.0	Max. :4.930	Max. :5.424	Max. :22.90	Max. :1.0000	Max. :1.0000	Max. :5.000	Max. :8.000

Try the `gtsummary` package

Learn more about the gtsummary package at: https://www.danieldsjoberg.com/gtsummary/index.html
Inspiration Gallery, https://www.danieldsjoberg.com/gtsummary/articles/gallery.html.

library(gtsummary)
mtcars %>%
  tbl_summary()

Characteristic	N = 32¹
mpg	19.2 (15.4, 22.8)
cyl
4	11 (34%)
6	7 (22%)
8	14 (44%)
disp	196 (121, 326)
hp	123 (97, 180)
drat	3.70 (3.08, 3.92)
wt	3.33 (2.58, 3.61)
qsec	17.71 (16.89, 18.90)
vs	14 (44%)
am	13 (41%)
gear
3	15 (47%)
4	12 (38%)
5	5 (16%)
carb
1	7 (22%)
2	10 (31%)
3	3 (9.4%)
4	10 (31%)
6	1 (3.1%)
8	1 (3.1%)
¹ Median (IQR); n (%)

Look at statistics by group.

mtcars %>%
  tbl_summary(by = cyl)

Characteristic	4, N = 11¹	6, N = 7¹	8, N = 14¹
mpg	26.0 (22.8, 30.4)	19.7 (18.7, 21.0)	15.2 (14.4, 16.3)
disp	108 (79, 121)	168 (160, 196)	351 (302, 390)
hp	91 (66, 96)	110 (110, 123)	193 (176, 241)
drat	4.08 (3.81, 4.17)	3.90 (3.35, 3.91)	3.12 (3.07, 3.23)
wt	2.20 (1.89, 2.62)	3.22 (2.82, 3.44)	3.76 (3.53, 4.01)
qsec	18.90 (18.56, 19.95)	18.30 (16.74, 19.17)	17.18 (16.10, 17.56)
vs	10 (91%)	4 (57%)	0 (0%)
am	8 (73%)	3 (43%)	2 (14%)
gear
3	1 (9.1%)	2 (29%)	12 (86%)
4	8 (73%)	4 (57%)	0 (0%)
5	2 (18%)	1 (14%)	2 (14%)
carb
1	5 (45%)	2 (29%)	0 (0%)
2	6 (55%)	0 (0%)	4 (29%)
3	0 (0%)	0 (0%)	3 (21%)
4	0 (0%)	4 (57%)	6 (43%)
6	0 (0%)	1 (14%)	0 (0%)
8	0 (0%)	0 (0%)	1 (7.1%)
¹ Median (IQR); n (%)

Add statistical comparison tests.

mtcars %>%
  tbl_summary(by = cyl) %>% 
  add_p()

Characteristic	4, N = 11¹	6, N = 7¹	8, N = 14¹	p-value²
mpg	26.0 (22.8, 30.4)	19.7 (18.7, 21.0)	15.2 (14.4, 16.3)	<0.001
disp	108 (79, 121)	168 (160, 196)	351 (302, 390)	<0.001
hp	91 (66, 96)	110 (110, 123)	193 (176, 241)	<0.001
drat	4.08 (3.81, 4.17)	3.90 (3.35, 3.91)	3.12 (3.07, 3.23)	<0.001
wt	2.20 (1.89, 2.62)	3.22 (2.82, 3.44)	3.76 (3.53, 4.01)	<0.001
qsec	18.90 (18.56, 19.95)	18.30 (16.74, 19.17)	17.18 (16.10, 17.56)	0.006
vs	10 (91%)	4 (57%)	0 (0%)	<0.001
am	8 (73%)	3 (43%)	2 (14%)	0.009
gear				<0.001
3	1 (9.1%)	2 (29%)	12 (86%)
4	8 (73%)	4 (57%)	0 (0%)
5	2 (18%)	1 (14%)	2 (14%)
carb				<0.001
1	5 (45%)	2 (29%)	0 (0%)
2	6 (55%)	0 (0%)	4 (29%)
3	0 (0%)	0 (0%)	3 (21%)
4	0 (0%)	4 (57%)	6 (43%)
6	0 (0%)	1 (14%)	0 (0%)
8	0 (0%)	0 (0%)	1 (7.1%)
¹ Median (IQR); n (%)
² Kruskal-Wallis rank sum test; Fisher’s exact test

Also try the arsenal package

Learn more about the arsenal package:

https://mayoverse.github.io/arsenal/
and the tableby() function https://mayoverse.github.io/arsenal/articles/tableby.html

This time, let’s look at the penguins dataset from the palmerpenguins package.

We’ll use the tableby() function from the arsenal package to get some summary stats.

NOTE: IMPORTANT - when using the arsenal package, you need to add results = "asis" in your r-chunk options so that the table looks correct when you “knit” your Rmarkdown file.

library(palmerpenguins)
library(arsenal)

tab1 <- tableby(~ bill_length_mm + bill_depth_mm +
                  flipper_length_mm + body_mass_g, 
                data = penguins)
summary(tab1)

	Overall (N=344)
bill_length_mm
N-Miss	2
Mean (SD)	43.922 (5.460)
Range	32.100 - 59.600
bill_depth_mm
N-Miss	2
Mean (SD)	17.151 (1.975)
Range	13.100 - 21.500
flipper_length_mm
N-Miss	2
Mean (SD)	200.915 (14.062)
Range	172.000 - 231.000
body_mass_g
N-Miss	2
Mean (SD)	4201.754 (801.955)
Range	2700.000 - 6300.000

We can also get comparison statistics by group with associated statistical tests. Let’s look at these summary stats by the 3 species of penguins.

tab1 <- tableby(species ~ bill_length_mm + bill_depth_mm +
                  flipper_length_mm + body_mass_g, 
                data = penguins)
summary(tab1)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Total (N=344)	p value
bill_length_mm					< 0.001
N-Miss	1	0	1	2
Mean (SD)	38.791 (2.663)	48.834 (3.339)	47.505 (3.082)	43.922 (5.460)
Range	32.100 - 46.000	40.900 - 58.000	40.900 - 59.600	32.100 - 59.600
bill_depth_mm					< 0.001
N-Miss	1	0	1	2
Mean (SD)	18.346 (1.217)	18.421 (1.135)	14.982 (0.981)	17.151 (1.975)
Range	15.500 - 21.500	16.400 - 20.800	13.100 - 17.300	13.100 - 21.500
flipper_length_mm					< 0.001
N-Miss	1	0	1	2
Mean (SD)	189.954 (6.539)	195.824 (7.132)	217.187 (6.485)	200.915 (14.062)
Range	172.000 - 210.000	178.000 - 212.000	203.000 - 231.000	172.000 - 231.000
body_mass_g					< 0.001
N-Miss	1	0	1	2
Mean (SD)	3700.662 (458.566)	3733.088 (384.335)	5076.016 (504.116)	4201.754 (801.955)
Range	2850.000 - 4775.000	2700.000 - 4800.000	3950.000 - 6300.000	2700.000 - 6300.000

Another COOL package, `summarytools`

Another really cool package that is useful for getting a quick summary of what is in your dataset along with some quick summary stats and tiny charts.

Learn more at:

Let’s look at the penguins dataset again.

And like the arsenal package, when we use the summarytools package, you need to add results = "asis" to the r-chunk options.

library(summarytools)
dfSummary(penguins, 
          plain.ascii  = FALSE, 
          style        = "grid", 
          graph.magnif = 0.75, 
          valid.col    = FALSE,
          tmp.img.dir  = "/tmp")

Data Frame Summary

penguins

Dimensions: 344 x 8
Duplicates: 0

No	Variable	Stats / Values	Freqs (% of Valid)	Missing
1	species [factor]	1. Adelie 2. Chinstrap 3. Gentoo	152 (44.2%) 68 (19.8%) 124 (36.0%)	0 (0.0%)
2	island [factor]	1. Biscoe 2. Dream 3. Torgersen	168 (48.8%) 124 (36.0%) 52 (15.1%)	0 (0.0%)
3	bill_length_mm [numeric]	Mean (sd) : 43.9 (5.5) min < med < max: 32.1 < 44.5 < 59.6 IQR (CV) : 9.3 (0.1)	164 distinct values	2 (0.6%)
4	bill_depth_mm [numeric]	Mean (sd) : 17.2 (2) min < med < max: 13.1 < 17.3 < 21.5 IQR (CV) : 3.1 (0.1)	80 distinct values	2 (0.6%)
5	flipper_length_mm [integer]	Mean (sd) : 200.9 (14.1) min < med < max: 172 < 197 < 231 IQR (CV) : 23 (0.1)	55 distinct values	2 (0.6%)
6	body_mass_g [integer]	Mean (sd) : 4201.8 (802) min < med < max: 2700 < 4050 < 6300 IQR (CV) : 1200 (0.2)	94 distinct values	2 (0.6%)
7	sex [factor]	1. female 2. male	165 (49.5%) 168 (50.5%)	11 (3.2%)
8	year [integer]	Mean (sd) : 2008 (0.8) min < med < max: 2007 < 2008 < 2009 IQR (CV) : 2 (0)	2007 : 110 (32.0%) 2008 : 114 (33.1%) 2009 : 120 (34.9%)	0 (0.0%)

summarytools::ctable()

Get a nice crosstable for 2 categorical variables using ctable() function. Let’s look at species and sex in the penguins dataset.

NOTE: At the moment ctable() will only work for HTML output. This does not work for DOC or PDF formats.

library(magrittr)
penguins %$%  # Acts like with(penguins, ...)
  ctable(x = species, y = sex,
         useNA = "no",
         chisq = TRUE,
         OR    = TRUE,
         RR    = TRUE,
         headings = FALSE) %>%
  print(method = "render")

	sex
species	female				male				Total
Adelie	73	(	50.0%	)	73	(	50.0%	)	146	(	100.0%	)
Chinstrap	34	(	50.0%	)	34	(	50.0%	)	68	(	100.0%	)
Gentoo	58	(	48.7%	)	61	(	51.3%	)	119	(	100.0%	)
Total	165	(	49.5%	)	168	(	50.5%	)	333	(	100.0%	)
Χ² = 0.0486 df = 2 p = .9760

Generated by summarytools 1.0.1 (R version 4.3.2)
2024-01-23

More fun packages to try out

These can all be fun to play with but with “great power comes great responsibility” - the key is looking for examples to adapt and reading the documentation.

For all of these getting the formatting to work across multiple output formats is really challenging. Typically, the developers get HTML and/or PDF (through LaTeX) working first and MS WORD DOCX formats are the hardest to adapt. Although if all fails (sometimes) you can simply cut and paste HTML output over into a WORD document - see kableExtra short video http://haozhu233.github.io/kableExtra/kableExtra_and_word.html.

reactablefmtr https://kcuilla.github.io/reactablefmtr/index.html
gtExtras
- https://jthomasmock.github.io/gtExtras/index.html
- https://themockup.blog/posts/2022-06-13-gtextras-cran/
flextable https://ardata-fr.github.io/flextable-book/ and gallery examples at https://ardata-fr.github.io/flextable-gallery/gallery/
kableExtra for added functionality for knitr::kable(), see https://cran.r-project.org/web/packages/kableExtra/

RMarkdown Tables - Data Summary and Presentation

Melinda Higgins

1/23/2024

Why you want Tables

Get Inspiration

Let’s try a simple table to get started

Try customization with the `gt` package

What about summary statistics?

Try the `gtsummary` package

Also try the arsenal package

Another COOL package, `summarytools`

Data Frame Summary

penguins

summarytools::ctable()

More fun packages to try out

RMarkdown Tables - Data Summary and Presentation

Melinda Higgins

1/23/2024

Why you want Tables

Get Inspiration

Let’s try a simple table to get started

Try customization with the gt package

What about summary statistics?

Try the gtsummary package

Also try the arsenal package

Another COOL package, summarytools

Data Frame Summary

penguins

summarytools::ctable()

More fun packages to try out

Try customization with the `gt` package

Try the `gtsummary` package

Another COOL package, `summarytools`