This lesson introduces you to missing data:
Y1 | Y1miss |
---|---|
12 | 0 |
13 | 0 |
1 | |
34 | 0 |
1 | |
1 | |
10 | 0 |
Using the newly created indicator variables (Y1miss, Y2miss, etc), use these to run association tests, like correlation, t-tests, chi-square tests, to see if the missing data is associated with any of your other predictors or outcomes (i.e. with anything in the rest of your dataset).
Depending on what you find in step 3, you can decide if you want to proceed with:
var1 | var2 | var3 | var4 | var5 | nmiss | missingYN |
---|---|---|---|---|---|---|
12 | w | 3 | 1 | 1 | 1 | |
13 | b | 55 | 2 | 2 | 0 | 0 |
b | 56 | 1 | 2 | 1 | 1 | |
34 | 87 | 2 | 1 | 1 | 1 | |
w | 88 | 1 | 2 | 1 | 1 | |
15 | w | 3 | 2 | 1 | ||
10 | b | 90 | 2 | 1 | 0 | 0 |
SIDE NOTE: We will briefly discuss mean substitution as a possible option in the context of people skipping items on a given survey instrument - later this semester. This is a common practice for some instruments and have been built into the underlying psychometric properties of that measurement tool. However, mean substitution is NOT RECOMMENDED as it is a BIASED method.
KEY BOOK: “Statistical Analysis with Missing Data, 2nd Edition” by Roderick J. A. Little, Donald B. Rubin http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471183865.html
ALSO:
Check out what is provided on the “Quick-R” website http://www.statmethods.net/input/missingdata.html
The Manning Book website for “R in Action” https://www.manning.com/books/r-in-action-second-edition?a_bid=5c2b1e1d&a_aid=RiA2ed Chapter 18 deals with Missing Data. NOTE: if you purchase the book you can then access the entire book content online.
Visulaizing Missing Data in SAS https://blogs.sas.com/content/iml/2016/04/20/visualize-missing-data-sas.html
Examine patterns of missing data in SAS https://blogs.sas.com/content/iml/2016/04/18/patterns-of-missing-data-in-sas.html
More on Visualizing Missing Data (based on Jon Fox’s Applied Regression book) http://scs.math.yorku.ca/index.php/Visualizing_missing_data - includes SAS code and macros
Potential “solutions” for missing data - Multiple Imputation (MI) and Maximum Likelihood (ML) http://www.theanalysisfactor.com/missing-data-two-recommended-solutions/
Paul Allison’s website (might get warning about unsafe website, not sure why) Discussion of MI versus ML https://m.statisticalhorizons.com/?url=https%3A%2F%2Fstatisticalhorizons.com%2Fml-better-than-mi&width=412