All Things R: Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

Wednesday, April 11, 2012

Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

Technologies: SAP HANA, R, HTML5, D3, Google Maps, JQuery and JSON

For this fun exercise, I analyzed more than 200 million data points using SAP HANA and R and then brought in the aggregated results in HTML5 using D3, JSON and Google Maps APIs. The 2008 airlines data is from the data expo and I have been using this entire data set (123 million rows and 29 columns) for quite sometime. See my other blogs

The results look beautiful:

Each airport icon is clickable and when clicked displays an info-window describing the key stats for the selected airport:

I then used D3 to display the aggregated result set in the modal window (light box):

D3 made it looks ridiculously simpler to generate a table from a JSON file.

Unfortunately, I can't provide the live example due to the restrictions put in by Google Maps APIs and I am approaching my free API limits.

Fun fact: The Atlanta airport was the largest airport in 2008 on many dimensions: Total Flights Departed, Total Miles Flew, Total Destinations. It also experienced lower average departure delay in 2008 than Chicago O'Hare. I always thought Chicago O'Hare is the largest US airport.

As always, I just needed 6 lines of R code including two lines of code to write data in JSON and CSV files:

################################################################################

airports.2008.hp.summary <- airports.2008.hp[major.airports,

list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),

TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),

TotalFlights=length(Month),

TotalDestinations=length(unique(Dest)),

URL=paste("http://www.fly", Origin, ".com",sep="")),

by=list(Origin)][order(-TotalFlights)]

setkey(airports.2008.hp.summary, Origin)

#merge the two data tables

airports.2008.hp.summary <- major.airports[airports.2008.hp.summary,

list(Airport=airport,

AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations,

Address=paste(airport, city, state, sep=", "),

Lat=lat, Lng=long, URL)][order(-TotalFlights)]

airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)

writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json")

write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)

##############################################################################

Happy Coding and remember the possibilities are endless!

11 comments:

UnknownJanuary 11, 2013 at 9:34 AM
Thanks for your post. I've been learning R for the past few weeks and finding it wonderful, especially the data.table package (which you are using here).

You are joining the major.airports and airports.2008.hp.summary tables, and then overwriting the result to airports.2008.hp.summary. I think if you keep the existing airports.2008.hp.summary table and just add the new columns of matching rows from major.airports using the ":=" operator, you'll see a speed improvement. I tried an example and found it to be roughly 50 times faster (less than 2 seconds for a table with 90M rows, vs about 85 seconds using the overwrite method). Here is my example code:

# http://stackoverflow.com/questions/11308754/add-multiple-columns-to-r-data-table-in-one-function-call

library(data.table)

fDT1<-function(n) data.table(x=rep(rep(c("a","b","c"),each=3),n), y=rep(c(1L,3L,6L),n), v=rep(1L:9L,n), key="x")
DT2<-data.table(x=letters, z1=sample(1L:26L), z2=sample(27L:52L),key="x")

n<-1e7L
DT1<-fDT1(n)
res1<-system.time(DT1<-DT2[DT1])[3]

DT1<-fDT1(n)
res2<-system.time(DT1[DT2,c("z1","z2"):=list(z1,z2),nomatch=0])[3]

list(method_1=res1,method_2=res2,improvement=paste0(round(res1/res2,1),"X"))
ReplyDelete
Replies
venkatAugust 2, 2017 at 12:18 AM
.It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.

Hadoop Online Training
Data Science Online Training|
R Programming Online Training|
ReplyDelete
Replies
Malcom MarshallJanuary 22, 2020 at 4:42 AM
Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.

digital marketing course in chennai
best digital marketing training in chennai
seo training in chennai
online digital marketing training
best marketing books
best marketing books for beginners
best marketing books for entrepreneurs
best marketing books in india
digital marketing course fees
best seo service in chennai
digital marketing resources
digital marketing blog
digital marketing expert
ReplyDelete
Replies
vivekvedhaAugust 7, 2020 at 2:03 AM
Great post it has so much of valuable information.
acte reviews

acte velachery reviews

acte tambaram reviews

acte anna nagar reviews

acte porur reviews

acte omr reviews

acte chennai reviews

acte student reviews

ReplyDelete
Replies
sathyaAugust 13, 2020 at 3:02 AM
Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming post to enhance my skill set, keep blogging.I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
selenium training in chennai

selenium training in chennai

selenium online training in chennai

software testing training in chennai

selenium training in bangalore

selenium training in hyderabad

selenium training in coimbatore

selenium online training

selenium training

ReplyDelete
Replies
anandAugust 19, 2020 at 7:29 AM
good information
Software Testing Training in Chennai | Certification | Online
Courses

Software Testing Training in Chennai

Software Testing Online Training in Chennai

Software Testing Courses in Chennai

Software Testing Training in Bangalore

Software Testing Training in Hyderabad

Software Testing Training in Coimbatore

Software Testing Training

Software Testing Online Training
ReplyDelete
Replies
dhineshSeptember 3, 2020 at 8:57 AM
Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great.

Full Stack Course in Bangalore

Full Stack Training in Hyderabad

Full Stack Course in Hyderabad

Full Stack Training

Full Stack Course

Full Stack Online Training

Full Stack Online Course

ReplyDelete
Replies
UnknownFebruary 16, 2021 at 9:21 AM
Definitely a great post. Hats off to you! The information that you have provided is very helpful. Also read this article Franchise Options In Bangalore
ReplyDelete
Replies
Steven CohenNovember 9, 2021 at 10:34 PM
That Is Very Interesting, You Are An Excessively Skilled Blogger. Stay In Control Of Your Online Trades With AximTrade Review Login, A Cloud-based Online Trading Platform.
ReplyDelete
Replies
abcFebruary 18, 2022 at 1:55 AM
This comment has been removed by the author.
ReplyDelete
Replies
fudxFebruary 24, 2022 at 5:18 AM
Fudx is a hospitality industry which caters the need of an individual by providing them with food,medicines,grocery and dairy products at their door steps with speedy delivery from your favourite places. One can order through Fudx app and the needs of the customers are met with their speedy service. One need not go anywhere ,just download its app and start ordering.

contact@thefudx.com
+91 9833 86 86 86
022 4976 1922
ReplyDelete
Replies

Add comment

All Things R

Wednesday, April 11, 2012

Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

11 comments:

Blog Archive

Followers