Summary Statistics Tables for Cleaned Abalone Dataset

Summary stats using the arsenal package

This is a quick summary statistics table for the cleaned abalone dataset made using the `arsenal package.

NOTE: You need to add results = "asis" in the R chunk option in order for this table to render correctly when “knitted”.

Learn more at https://cran.r-project.org/web/packages/arsenal/vignettes/tableby.html.

Look at sex, length, height, diameter by adult category

Note: By default categorical data will perform a chi-square test for differences in the groups and ANOVA (or t-test for 2 groups) is run for the continuous data.

library(arsenal)
tab1 <- tableby(adult ~ sex + length + diameter + height,
                data = abalone_clean)
summary(tab1, pfootnote = TRUE)
adult (N=2834) immature (N=1335) Total (N=4169) p value
sex < 0.0011
   F 1306 (46.1%) 0 (0.0%) 1306 (31.3%)
   I 0 (0.0%) 1335 (100.0%) 1335 (32.0%)
   M 1528 (53.9%) 0 (0.0%) 1528 (36.7%)
length < 0.0012
   Mean (SD) 0.570 (0.096) 0.428 (0.109) 0.524 (0.120)
   Range 0.155 - 0.815 0.075 - 0.725 0.075 - 0.815
diameter < 0.0012
   Mean (SD) 0.446 (0.079) 0.327 (0.088) 0.408 (0.099)
   Range 0.110 - 0.650 0.055 - 0.550 0.055 - 0.650
height < 0.0012
   Mean (SD) 0.154 (0.033) 0.108 (0.032) 0.139 (0.039)
   Range 0.015 - 0.515 0.010 - 0.220 0.010 - 0.515
  1. Pearson’s Chi-squared test
  2. Linear Model ANOVA

Look at whole, shucked, viscera, shell weights by adult category

tab1 <- tableby(adult ~ wholeWeight + shuckedWeight +
                  visceraWeight + shellWeight,
                data = abalone_clean)
summary(tab1, pfootnote = TRUE)
adult (N=2834) immature (N=1335) Total (N=4169) p value
wholeWeight < 0.0011
   Mean (SD) 1.017 (0.453) 0.432 (0.286) 0.830 (0.490)
   Range 0.015 - 2.825 0.002 - 2.050 0.002 - 2.825
shuckedWeight < 0.0011
   Mean (SD) 0.439 (0.212) 0.191 (0.128) 0.360 (0.222)
   Range 0.006 - 1.488 0.001 - 0.773 0.001 - 1.488
visceraWeight < 0.0011
   Mean (SD) 0.223 (0.102) 0.092 (0.063) 0.181 (0.110)
   Range 0.003 - 0.760 0.000 - 0.440 0.000 - 0.760
shellWeight < 0.0011
   Mean (SD) 0.291 (0.129) 0.128 (0.085) 0.239 (0.139)
   Range 0.005 - 1.005 0.002 - 0.655 0.002 - 1.005
  1. Linear Model ANOVA

Another option with the gtsummary package

I really like the arsenal package since it works really well for knitting to WORD, but the gtsummary also works ok with WORD and HTML and other formats also. And there seems to be more active development happening for the gtsummary package so it may be the future.

Learn more at https://www.danieldsjoberg.com/gtsummary/index.html.

Look at sex, length, height, diameter by adult category

Note: In the code below, I specifically designated that all categorical data would perform a chi-square test and for the continuous data the t-test is performed, see https://www.danieldsjoberg.com/gtsummary/reference/add_p.tbl_summary.html.

I also changed the default statistics. The default is to report the median and IQR, I changed it to mean and SD - see https://www.danieldsjoberg.com/gtsummary/reference/tbl_summary.html.

library(gtsummary)

# make dataset with a few variables to summarize
ab2 <- abalone_clean %>% 
  select(sex, adult, length, diameter, height)

table2 <- 
  tbl_summary(
    ab2,
    by = adult, 
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no" 
  ) %>%
  add_n() %>% 
  add_p(test = list(all_continuous() ~ "t.test",
                    all_categorical() ~ "chisq.test")) %>% 
  modify_header(label = "**Variable**") %>% 
  bold_labels() 

table2
Variable N adult, N = 2,8341 immature, N = 1,3351 p-value2
sex 4,169 <0.001
F 1,306 (46%) 0 (0%)
I 0 (0%) 1,335 (100%)
M 1,528 (54%) 0 (0%)
length 4,169 0.57 (0.10) 0.43 (0.11) <0.001
diameter 4,169 0.45 (0.08) 0.33 (0.09) <0.001
height 4,169 0.15 (0.03) 0.11 (0.03) <0.001

1 n (%); Mean (SD)

2 Pearson's Chi-squared test; Welch Two Sample t-test

Look at whole, shucked, viscera, shell weights by adult category

In the code below, I changed the test to the non-parametrc wilcox.test.

# make dataset with a few variables to summarize
ab3 <- abalone_clean %>% 
  select(adult, wholeWeight, shuckedWeight, 
         visceraWeight, shellWeight)

table3 <- 
  tbl_summary(
    ab3,
    by = adult, 
    missing = "no" 
  ) %>%
  add_n() %>% 
  add_p(test = list(all_continuous() ~ "wilcox.test")) %>% 
  modify_header(label = "**Variable**") %>% 
  bold_labels() 

table3
Variable N adult, N = 2,8341 immature, N = 1,3351 p-value2
wholeWeight 4,169 1.00 (0.70, 1.29) 0.38 (0.21, 0.60) <0.001
shuckedWeight 4,169 0.43 (0.29, 0.57) 0.17 (0.09, 0.27) <0.001
visceraWeight 4,169 0.22 (0.15, 0.29) 0.08 (0.04, 0.13) <0.001
shellWeight 4,169 0.28 (0.20, 0.36) 0.11 (0.06, 0.18) <0.001

1 Median (IQR)

2 Wilcoxon rank sum test