All Things R: January 2012

Monday, January 30, 2012

Updated Sentiment Analysis and a Word Cloud for Netflix - The R Way!

The Netflix investors must be happy and cheerful as the stock is up more than 78% since the beginning of the year (YES, 78%, Source: Yahoo Finance!). I am not going to talk about what turned the stock around after a much talked/hyped about Netflix debacle of the late 2011 that earned Reed Hastings quite a few UNWANTED title and every one demanded his resignation from the top post. Not so fast, Mr. Bear! Reed Hastings must be smiling! After a stellar performance this year including carefully released stats on viewership, streaming hours as well as a solid Q4'11 earnings, Netflix is back and most importantly viewers are back!

Well, is is not coincidental that the sentiment for Netflix is also improving, 68% of the tweets now have positive sentiment. See the table below:

*Total*	*Positive*	*Negative*	*Average*	*Total*	*Sentiment*
Tweets Fetched	*Tweets*	*Tweets*	*Score*	*Tweets*	*Sentiment*
499	171	80	0.281	251	68%

*Make sure you understand and interpret this analysis correctly. This analysis is not based on NLP.

I updated the sentiment analysis that I did last year, http://goo.gl/fkfPy , (I was then just beginning to play with Twitter and Text Mining packages in R) and used advanced packages like "TM" and "WordCloud". The new analysis is based on more than 6,800 words which are most commonly prescribed in various sentiment analysis blogs/books. (Check out Hu and Liu http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

I came across this excellent blog by Jeffrey Bean, @JeffreyBean, (http://goo.gl/RPkFX) and his tutorial. Thank you Mr. Bean! Please follow the instructions from Bean's slides and the R code listed there as well as the R code here:

Here is the updated R code snippets -
#Populate the list of sentiment words from Hu and Liu (http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

huliu.pwords <- scan('opinion-lexicon/positive-words.txt', what='character', comment.char=';')

huliu.nwords <- scan('opinion-lexicon/negative-words.txt', what='character', comment.char=';')

# Add some words

huliu.nwords <- c(huliu.nwords,'wtf','wait','waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie')

#Remove some words

huliu.nwords <- huliu.nwords[!huliu.nwords=='sap']

huliu.nwords <- huliu.nwords[!huliu.nwords=='cloud']

#which('sap' %in% huliu.nwords)

twitterTag <- "@Netflix"

# Get 1500 tweets - an individual is only allowed to get 1500 tweets

tweets <- searchTwitter(tag, n=1500)

tweets.text <- laply(tweets,function(t)t$getText())

sentimentScoreDF <- getSentimentScore(tweets.text)

sentimentScoreDF$TwitterTag <- twitterTag

# Get rid of tweets that have zero score and seperate +ve from -ve tweets

sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$SentimentScore >=1)

sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$SentimentScore <=-1)

#Summarize finidings

summaryDF <- ddply(sentimentScoreDF,"TwitterTag", summarise,

TotalTweetsFetched=length(SentimentScore),

PositiveTweets=sum(posTweets), NegativeTweets=sum(negTweets),

AverageScore=round(mean(SentimentScore),3))

summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets

#Get Sentiment Score

summaryDF$Sentiment <- round(summaryDF$PositiveTweets/summaryDF$TotalTweets, 2)

Saving the best for the last, here is a word cloud (also called tag cloud) for Netflix built in R-

I will be putting the R code up here for building a word cloud after scrubbing it.

Happy Analyzing!

Sentiment Analysis, the R way, on Netflix's September 18th Announcement

Re-posting this blog from my other blog on Analytics (http://allthingsbusinessanalytics.blogspot.com/)

Did Netflix make a bad move or a bold move, only time will tell but for now here is a simple sentiment analysis using R and TwitteR package on tweets involving Netflix for you to consume...

So aftermath of #netflix supposedly bad strategic move, I thought that it will be little fun to do a little sentiment analysis using a sample of tweets from the past few days. I turned to my favorite "R" and discovered a new package called "TwitteR" and 4 lines of code later, I had the following outcome:

788 of the 1500 tweets, that is 52.5% of the tweets, over the last three days had words bad, suck, terrible or :( with #netflix...

You be the judge whether Netflix customers are unhappy and whether it was a bad (or bold) strategic move...

> library("twitteR")

> searchNF <- searchTwitter("#netflix bad OR suck OR terrible OR disaster OR :(", n=1500, since=as.character(Sys.Date()-3))

> negativeTweets <- length(searchNF)

> negativeSentiment <- negativeTweets/1500

Tuesday, January 24, 2012

Geocode your data using, R, JSON and Google Maps' Geocoding APIs

Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics. In 2012, I decided to take thing in my control and turned to R. Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.

To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Here is function that can be called repeatedly by other functions:

getGeoCode <- function(gcStr)

{

library("RJSONIO") #Load Library

gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters

#Open Connection

connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="")

con <- url(connectStr)

data.json <- fromJSON(paste(readLines(con), collapse=""))

close(con)

#Flatten the received JSON

data.json <- unlist(data.json)

lat <- data.json["results.geometry.location.lat"]

lng <- data.json["results.geometry.location.lng"]

gcodes <- c(lat, lng)

names(gcodes) <- c("Lat", "Lng")

return (gcodes)

}

Let's put this function to test:

geoCodes <- getGeoCode("Palo Alto,California")

> geoCodes
Lat Lng
"37.4418834" "-122.1430195"

You can run this on the entire column of a data frame or a data table:

Here is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.

> head(shortDS,10)

Opposition Ground.Country Toss

1 Pakistan Karachi,Pakistan won

2 Pakistan Faisalabad,Pakistan lost

3 Pakistan Lahore,Pakistan won

4 Pakistan Sialkot,Pakistan lost

5 New Zealand Christchurch,New Zealand lost

6 New Zealand Napier,New Zealand won

7 New Zealand Auckland,New Zealand won

8 England Lord's,England won

9 England Manchester,England lost

10 England The Oval,England won

To geo code this, here is a simple one liner I execute:

shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,

laply(Ground.Country, function(val){getGeoCode(val)})))

> head(shortDS, 10)
Opposition Ground.Country Toss Ground.Lat Ground.Lng
1 Pakistan Karachi,Pakistan won 24.893379 67.028061
2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458
3 Pakistan Lahore,Pakistan won 31.54505 74.340683
4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111
5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6 New Zealand Napier,New Zealand won -39.4928444 176.9120178
7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315
8 England Lord's,England won 51.5294 -0.1727
9 England Manchester,England lost 53.479251 -2.247926
10 England The Oval,England won 51.369037 -2.378269

Happy Coding!

All Things R

Monday, January 30, 2012

Updated Sentiment Analysis and a Word Cloud for Netflix - The R Way!

Sentiment Analysis, the R way, on Netflix's September 18th Announcement

Tuesday, January 24, 2012

Geocode your data using, R, JSON and Google Maps' Geocoding APIs

Blog Archive

Followers