Dygraphs With Bigger but Not Quite Big Data

How big is big? I'm not sure. Sometimes with "big" data d3/svg gets sluggish. An older robust canvas-based HTML5 library dygraphs claims,

Handles huge data sets: dygraphs plots millions of points without getting bogged down.

Let's test it with some "almost big" data in the form of US Industry daily return data since 1926 from the Kenneth French data library. This is 23,027 rows and 48 columns for 1,105,296 tuples.

Also, you should see the nice closest series highlighting functionality of dygraphs.

click/drag to zoom; shift+click/drag to pan; double-click to unzoom


Get the Data

library(rCharts)
# get very helpful Ken French data for this project we will look at Industry
# Portfolios
# http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/48_Industry_Portfolios_daily.zip

require(quantmod)
# my.url will be the location of the zip file with the data
my.url = "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/48_Industry_Portfolios_daily.zip"
# this will be the temp file set up for the zip file
my.tempfile <- paste(tempdir(), "\\frenchindustry.zip", sep = "")
# my.usefile is the name of the txt file with the data
my.usefile <- paste(tempdir(), "\\48_Industry_Portfolios_daily.txt", sep = "")
download.file(my.url, my.tempfile, method = "auto", quiet = FALSE, mode = "wb", 
    cacheOK = TRUE)
unzip(my.tempfile, exdir = tempdir(), junkpath = TRUE)
# read space delimited text file extracted from zip
french_industry <- read.table(file = my.usefile, header = TRUE, sep = "", as.is = TRUE, 
    skip = 9, nrows = 23027)

# get dates ready for xts index
datestoformat <- rownames(french_industry)
datestoformat <- paste(substr(datestoformat, 1, 4), substr(datestoformat, 5, 
    6), substr(datestoformat, 7, 8), sep = "-")

# get xts for analysis
french_industry_xts <- as.xts(french_industry[, 1:NCOL(french_industry)], order.by = as.Date(datestoformat))

# divide by 100 to get percent
french_industry_xts <- french_industry_xts/100

# delete missing data which is denoted by -0.9999
french_industry_xts[which(french_industry_xts < -0.99, arr.ind = TRUE)[, 1], 
    unique(which(french_industry_xts < -0.99, arr.ind = TRUE)[, 2])] <- 0

# get price series or cumulative growth of 1
french_industry_price <- log(cumprod(french_industry_xts + 1))

Write the Data To Demonstrate Data from url

# write to a csv that we will read with dygraphs url
write.csv(data.frame(french_industry_price), "french_industry.csv", quote = F)

rCharts Magic

dy1 <- rCharts$new()
dy1$setLib(".")
dy1$templates$script = "chart_csv.html"
dy1$set(data = "./french_industry.csv", chart = list(title = "US Industries Since 1926 | source: Kenneth French", 
    ylabel = "Cumulative Return (log)", labelsDiv = "#!document.getElementById('status')!#", 
    labelsDivStyles = list(background = "none"), strokeWidth = 0.75, showLabelsOnHighlight = TRUE, 
    highlightCircleSize = 2, highlightSeriesOpts = list(strokeWidth = 1, highlightCircleSize = 5), 
    width = 550))
cat(noquote(dy1$html(chartId = "dygraphIndustry")))

Thanks