New to gh-pages-blog? Learn more by visiting the Getting Started page.

Using R to Datamine Twitter - Intro


First - set-up a developer account with Twitter

In order to submit data requests to Twitter, you need a developer account. Go to Twitter Developers https://dev.twitter.com/, aka. Fabric.

center

  • Click “More”” and “Get Started with Fabric”
  • Fill out your name and email to request an account
  • Mine took a week to get an approval email
  • Once you get the email invitation, follow the instructions to activate your account.
  • Go to the Twitter Applications manager at https://apps.twitter.com/
  • Create a new app - fill out the required fields
  • Generate a consumer key - save the consumer key and consumer secret
  • Generate the access token and secret

Learn more at the Credera Blog

  • go to at http://blog.credera.com/business-intelligence/twitter-analytics-using-r-part-1-extract-tweets/
  • see more posts at http://blog.credera.com/?s=twitter

Getting started in R

You will need to install the following packages in R: twitteR and ROAuth for “talking: to Twitter. You also need the tm and wordcloud packages to create the wordlcoud figure.

install.packages("twitteR")
install.packages("ROAuth")
library("twitteR")
library("ROAuth")
install.packages("tm")
library("tm")
install.packages("wordcloud")
library("wordcloud")

Get your authenticated credentials (CAcert and cURL)

For Windows’ users you need to get the cacert.pem file. This gets stored in your local directory so be sure that your working directory is set how you want it. Run getwd() to check what R thinks is your current working directory. Use setwd(c:/xxxx/xxxx) to set the path to what you want.

Some Important Notes

IMPORTANT NOTE - at my twitter developer site I had to update the permissions for the app to be read, write and direct messages see https://apps.twitter.com/app/7358826 then the steps below worked - with just read permissions I kept getting “Authorization Required” error…

[1] "Authorization Required"  
     Error in twInterfaceObj$doAPICall(cmd, params, "GET", ...) : 
     Error: Authorization Required

Now that you have your Twitter application configured for read, write and direct messages permissions, download the CAcert.

download.file(url="http://curl.haxx.se/ca/cacert.pem",destfile="cacert.pem")

Next create an object with the authentication details for later sessions. You will need your consumer Key and Secret from your Twitter app to input here.

# create an object "cred" that will save the authenticated object that we can use for later sessions
# input your own consumerKey and Secret below
cred <- OAuthFactory$new(consumerKey='xxxxxxxxxxxxxxxxxxxxxxxxx',
                         consumerSecret='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
                         requestURL='https://api.twitter.com/oauth/request_token',
                         accessURL='https://api.twitter.com/oauth/access_token',
                         authURL='https://api.twitter.com/oauth/authorize')

# Executing the next step generates an output --> To enable the connection, please direct your web browser to: <hyperlink> . Note:  You only need to do this part once
cred$handshake(cainfo="cacert.pem")

#save for later use for Windows
save(cred, file="twitter_authentication.Rdata")

# Load "twitter authentication.Rdata" file in your session and then run registerTwitterOAuth. 
# This should return "TRUE" indicating that all is good and we can proceed. 

load("twitter_authentication.Rdata")
registerTwitterOAuth(cred)

Next do a search on twitter and parse through the tweets and create a wordcloud

search.string <- "#nursing"
no.of.tweets <- 1499
tweets <- searchTwitter(search.string, n=no.of.tweets, cainfo="cacert.pem", lang="en")

This may take a few minutes to run depending on the amount of tweets being extracted.

Here are some of the tweets extracted using head(tweets) to pull the first few tweets extracted.

[[1]]
[1] "stjoehealthjobs: #Nursing #Job in #FULLERTON, CA: Patient Care Tech, Oncology, FT, Nights, 12hr at St. Joseph's Health http://t.co/gL1puLnlxS"

[[2]]
[1] "ILnursejobs: #nursing #jobs Nurse Practitioner at Healthstat (IL): Nurse Practitioner (5 hrs/wk) Needed at On-Site Employer...  http://t.co/ngP4ykfbFZ"

[[3]]
[1] "ILnursejobs: #nursing #jobs PRN Endoscopy Nurse at Advocate Health Care Network (Barrington): For more than 35 years, Good ...  http://t.co/ngP4ykfbFZ"

[[4]]
[1] "ILnursejobs: #nursing #jobs On the search for another quality NP for Psych just NW of Chicago at http://t.co/KXqT9XjNNF (IL)...  http://t.co/ngP4ykfbFZ"

[[5]]
[1] "ILnursejobs: #nursing #jobs STAFF NURSE I at Provena Saint Joseph Medical Center (Evanston, IL): STAFF NURSE I Facility Pre...  http://t.co/ngP4ykfbFZ"

[[6]]
[1] "dreytoledo: God is so GOOD! í ½í¸,í ½í²Tí ½í²sí ½í²< #life #financial #problems #nursing http://t.co/1OS0EV2MeA"

The next set of commands will parse through these tweets and extract the key words we will use in the final wordcloud.

# create a function to extract text
tweets.text <- sapply(tweets, function(x) x$getText())

#convert all text to lower case
tweets.text <- tolower(tweets.text)

# Replace blank space ("rt")
tweets.text <- gsub("rt", "", tweets.text)

# Replace @UserName
tweets.text <- gsub("@\\w+", "", tweets.text)

# Remove punctuation
tweets.text <- gsub("[[:punct:]]", "", tweets.text)

# Remove links
tweets.text <- gsub("http\\w+", "", tweets.text)

# Remove tabs
tweets.text <- gsub("[ |\t]{2,}", "", tweets.text)

# Remove blank spaces at the beginning
tweets.text <- gsub("^ ", "", tweets.text)

# Remove blank spaces at the end
tweets.text <- gsub(" $", "", tweets.text)

#create corpus
tweets.text.corpus <- Corpus(VectorSource(tweets.text))

#clean up by removing stop words
tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords()))

Here is what the cleaned up text now looks like for the 1st tweet extracted above.

tweets.text.corpus[1]$content

[[1]]
<<PlainTextDocument (metadata: 7)>>
nursing job  fulleon ca patient care tech oncology ft nights 12hr  st josephs health

Finally, generate the wordcloud for all of the extracted content from these 1499 tweets.

#generate wordcloud
wordcloud(tweets.text.corpus,min.freq = 2, scale=c(7,0.5),colors=brewer.pal(8, "Dark2"),  random.color= TRUE, random.order = FALSE, max.words = 150)

center



Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 08 December 2014.
Comments
Category: R
Tags: r


Test Post to RPubs


add link to publication or post to RPubs

RPubs Test 1


Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 29 November 2014.
Comments


My website launched


I just completed the setup of my Github website and a Project Site on SAS University.


Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 23 November 2014.
Comments
Category: Website
Tags: website


Setting up Twitter


I just setup my Twitter account. Find me at @mhiggins2000 .


Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 23 November 2014.
Comments
Category: Twitter
Tags: twitter


Test Post Reproducible Research


test post for Reproducible Research


Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 23 November 2014.
Comments


Melinda's First Post

place for subtitle


My First Post

Here is some text for my first post


Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 22 November 2014.
Comments
Category: Melinda
Tags: melinda


Test R Markdown file - converted


based on KnitPost

see http://jfisher-usgs.github.io/r/2012/07/03/knitr-jekyll/

The knitr package provides an easy way to embed R code in a Jekyll-Bootstrap blog post. The only required input is an R Markdown source file. The name of the source file used to generate this post is 2012-07-03-knitr-jekyll.Rmd, available here. Steps taken to build this post are as follows:

Step 1

Create a Jekyll-Boostrap blog if you don’t already have one. A brief tutorial on building this blog is available here.

Step 2

Open the R Console and process the source file:

KnitPost <- function(input, base.url = "/") {
  require(knitr)
  opts_knit$set(base.url = base.url)
  fig.path <- paste0("figs/", sub(".Rmd$", "", basename(input)), "/")
  opts_chunk$set(fig.path = fig.path)
  opts_chunk$set(fig.cap = "center")
  render_jekyll()
  knit(input, envir = parent.frame())
}
KnitPost("2012-07-03-knitr-jekyll.Rmd")

Step 3

Move the resulting image folder 2012-07-03-knitr-jekyll and Markdown file 2012-07-03-knitr-jekyll.md to the local jfisher-usgs.github.com git repository. The KnitPost function assumes that the image folder will be placed in a figs folder located at the root of the repository.

Step 4

Add the following CSS code to the /assets/themes/twitter-2.0/css/bootstrap.min.css file to center images:

[alt=center] {
  display: block;
  margin: auto;
}

Thats it.


Here are a few examples of embedding R code:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
par(mar = c(4, 4, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot(cars)

center

Figure 1-1: Caption
par(mar = c(2.5, 2.5, 0.5, 0.1), omi = c(0, 0, 0, 0))
filled.contour(volcano)

center

Figure 2-1: Caption

And dont forget your session information for proper reproducible research.

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.8
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.4     evaluate_0.5.5   formatR_1.0      htmltools_0.2.6 
## [5] rmarkdown_0.3.11 stringr_0.6.2    tools_3.1.2      yaml_2.1.13

Authored by Melinda Higgins, Ph.D. Biostatistician and Chemometrician.
Published on 03 July 2012.
Comments
Category: R
Tags: r



MELINDA HIGGINS, Ph.D.

Biostatistician and Chemometrician


FOLLOW MELINDA HIGGINS


FORK MY GUTHUB BLOG

© Copyright 2012-2014