class: center, middle, inverse, title-slide # R Objects - Factors ## How to work with numbers and text together ### Melinda Higgins ### Emory University 02/11/2021 --- # Numeric Data in R .pull-left[ <img src="img/continuous_discrete.png" width="100%" /> ] .pull-right[ ### CONTINUOUS - numeric (double) ```r x <- c(1.12, 1.13, 4.23) class(x) ``` ``` ## [1] "numeric" ``` ### DISCRETE - integer ```r y <- as.integer(c(1,2,3,4)) class(y) ``` ``` ## [1] "integer" ``` ] --- # Non-numeric Data in R <img src="img/nominal_ordinal_binary.png" width="80%" /> --- # Nominal Data .left-column[ <img src="img/nominal.jpg" width="100%" /> ] -- ### From character (text strings) ```r animal <- c("turtle","snail","butterfly", "turtle","butterfly") class(animal) ``` ``` ## [1] "character" ``` -- ```r table(animal) %>% knitr::kable() ``` |animal | Freq| |:---------|----:| |butterfly | 2| |snail | 1| |turtle | 2| Notice that these are in alphabetical order --- # Nominal Data as "Factor" .pull-left[ Suppose on a survey each nurse selects the hospital where they work: 1. Emory Hospital (on Clifton Road) 2. Emory Midtown 3. Emory Decatur 4. Saint Joseph's And we get the 7 responses of: 1, 1, 2, 3, 2, 2, 4 ```r hosp_choice <- c(1,1,2,3,2,2,4) hospital <- factor(hosp_choice, levels = c(1,2,3,4), labels = c("EHospital", "EMidtown", "EDecatur", "St. Joe")) ``` ] -- .pull-right[ Notice that the * `levels` corresponds to the response (choice) number and * `labels` corresponds to the text of the response option. ```r table(hospital) %>% knitr::kable() ``` |hospital | Freq| |:---------|----:| |EHospital | 2| |EMidtown | 3| |EDecatur | 1| |St. Joe | 1| Notice the responses are **in order by the response number** NOT alphabetical order.] --- # Ordinal Data as "Factor" .left-column[ <img src="img/ordinal.jpg" width="=100%" /> ] .right-column[ ### Alphabetical Order is a Problem Suppose we have "awful", "ok" and "awesome" as options for "how are you feeling today?" question. ```r feeling <- c("ok","ok","awesome","awful") ``` In a table, the default is alphabetical order: ```r table(feeling) %>% knitr:: kable() ``` |feeling | Freq| |:-------|----:| |awesome | 1| |awful | 1| |ok | 2| ] --- # Create Ordered Factor List the text responses (levels) in the order you want and specify `ordered = TRUE`. ```r feeling_ordf <- factor(feeling, levels = c("awesome", "ok", "awful"), * ordered = TRUE) ``` -- Now the table is in a logical order ```r table(feeling_ordf) %>% knitr:: kable() ``` |feeling_ordf | Freq| |:------------|----:| |awesome | 1| |ok | 2| |awful | 1| --- # Binary Data .left-column[ <img src="img/binary.jpg" width="=100%" /> ] .right-column[ A variable is "binary" if it only has 2 values. ```r animals <- c("shark","shark","dino","dino","shark") table(animals) %>% knitr::kable() ``` |animals | Freq| |:-------|----:| |dino | 2| |shark | 3| Logical variables are also "binary". ```r extinct <- animals == "dino" table(extinct) %>% knitr::kable() ``` |extinct | Freq| |:-------|----:| |FALSE | 3| |TRUE | 2| ] --- # Binary Data - Use TRUE=1, FALSE=0 A neat feature of **TRUE/FALSE** logical variables is that because they convert to `0=FALSE, 1=TRUE` you can use: * the `sum()` function to get the number (count) of **TRUE** and * use the `mean()` function to get the proportion of **TRUE**. ```r sum(extinct) ``` ``` ## [1] 2 ``` ```r mean(extinct) ``` ``` ## [1] 0.4 ``` ```r # multiply by 100 to get percent TRUE mean(extinct)*100 ``` ``` ## [1] 40 ``` Write a sentence in Rmarkdown with inline code using `` `r mean(extinct)*100` `` to write: "The percentage of extinct animals was 40 %.