Introduction to Bootstrapping

Bootstrapping

Boostrapping is a non-parametric approach for computing statistical estimates using a resampling approach as the basis for estimating variance in the data. The idea is to take the sample data and treat it as your “population” from which you take repeated samples. Then compute the statistic you’re interested in from each sample and then look at the distribution of that statistic across all of the repeated samples. Here is the basic workflow:

Let’s say the sample has BMI’s (body mass indexes) for 100 people - this is the ORIGINAL sample.
Take a random sample of 100 (yes, the same sample size) WITH REPLACEMENT from your original sample.
Compute the statistics you are interested in - for example, the median - save this estimate of the median as MEDIAN Estimate 1.
Repeat steps 2 and 3 a bunch of times, like 1000 (typically you do this 500, 1000, or 2000+ times) - and save all of the resulting estimates of the statistic tyou are interested in.
You now have 1000 sample medians.
Find the Median of these 1000 sample medians and a “95% confidence interval” for this estimate; use this “confidence interval” for the sample median to do inference - to answer the question is the sample median significantly different from 0, just look to see if the confidence interval for these bootstrapped sample medians contains 0 or not.

NOTE: There are multiple approaches for computing this 95% confidence interval for the bootstrapped estimate, so be sure to read the documentation for the method and software you choose.

Useful links:

The HELP Dataset

We are still working with the HELP (Health Evaluation and Linkage to Primary Care) dataset. See details at https://melindahiggins2000.github.io/N736Fall2017_lesson07/lesson07_univStats.html.

R Code

Run examples shown in lesson09_Rcode.R

SAS Code

Run examples shown in lesson09_SAScode.sas