The information available from the Flickr API is incredibly rich. This Atlantic article This is the World on Flickr motivated me to open up R and do some analysis on the Flickr Explore list. As you might expect, I'll be using my new favorite tools rCharts
and slidify
, and I will add one I have not mentioned Rflickr
.
I have always wondered what ISO speeds occur most frequently on Explore. I never imagined that I could answer my question with R. As usual, we will start by loading all the necessary packages.
# analyze EXIF data for interesting list
library(lubridate)
#if you have not installed Rflickr
#install.packages("Rflickr", repos = "http://www.omegahat.org/R", type="source")
library(Rflickr)
data(FlickrFunctions)
If you do not have a free noncommercial API key, apply for one here. Trust me it is very easy, so don't let this be an excuse not to try it out. I put mine in a little secrets.Rdata
file that I will load with following code and then start a session.
load("secrets.Rdata")
tok = authenticate(api_key, secret)
s <- flickrSession(secret, tok, api_key)
Since this is more a proof of concept rather than an ambitious scientific study, I'll just look back three days.
#use this to specify how many days to analyze
daysAnalyze = 3
My code gets a little sloppy here but it does work. Sorry for all the lapply
. I hope my comments will help you understand each of the steps.
#initialize a data frame to collect
df <- data.frame()
for(i in 1:daysAnalyze) {
interesting <- s$flickr.interestingness.getList(date=as.character(today()-ddays(i)))
print(today()-ddays(i)) #debug print what day we are getting
print(length(interesting)) #debug print the count of photos
#for each photo try to get the exif information
#Flickr allows users to block EXIF
#so use try to bypass error
exifData <- lapply(
1:length(interesting),
function(x){
exif <- try(s$flickr.photos.getExif(interesting[[x]]["id"]))
if (inherits(exif, "try-error")) exif = NA
return(exif)
}
)
#now that we have a list of EXIF for each photo
#that allows it
#use another lapply
#to extract the useful information
exifData.df <- lapply(
exifData,
function(x){
if (!(is.na(x))) {
exif.df <- do.call(rbind,lapply(
1:(length(x)-1),
function(y) {
df <- data.frame(
t(
data.frame(
x[[y]][".attrs"]
)
),
x[[y]]["raw"],
stringsAsFactors = FALSE
)
rownames(df)<-y
if("clean" %in% names(x[[y]])) {
df$clean = x[[y]]["clean"]
} else df$clean = NA
return(as.vector(df))
})
)
} else exif.df <- rep(NA,5)
return(exif.df)
}
)
#one more lapply to just get the ISO speed if available
isospeeds <- unlist(lapply(
exifData.df,
function(x){
if(!(is.na(x))) {
iso = x[which(x[,"label"]=="ISO Speed"),"raw"]
} else iso = NA
return(as.numeric(iso))
}
))
#make one data.frame with a Frequency(count) of ISO speeds
df <- rbind(
df,
data.frame(
as.character(today()-ddays(i)),
table(isospeeds)
)
)
}
[1] "2013-10-22" [1] 101 [1] "2013-10-21" [1] 101 [1] "2013-10-20" [1] 101
#name columns for our df data.frame
colnames(df) <- c("date","iso","Freq")
#get rid of factors
#thanks http://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-an-integer-numeric-without-a-loss-of-information
df$iso <- as.character(levels(df$iso))[df$iso]
Now that we have a data.frame
with ISO speeds, let's use rCharts
to analyze it. I will use dimplejs
.
# Thanks to http://tradeblotter.wordpress.com/
# Qualitative color schemes by Paul Tol
tol4qualitative=c("#4477AA", "#117733", "#DDCC77", "#CC6677")
require(rCharts)
dIso <- dPlot(
y = "Freq",
x = "iso",
groups = "date",
data = df,
type = "bar",
height = 400,
width =600
)
dIso$xAxis( orderRule = "iso" )
dIso$defaultColors(
#"#! d3.scale.category10() !#",
tol4qualitative,
replace = T
)
dIso
dIso <- dPlot(
y = "Freq",
x = c("iso","date"),
groups = "date",
data = df,
type = "bar",
height = 400,
width =600
)
dIso$xAxis( orderRule = "iso" )
dIso$defaultColors(
#"#! d3.scale.category10() !#",
tol4qualitative,
replace = T
)
dIso
dIso <- dPlot(
y = "Freq",
x = c("date","iso"),
groups = "date",
data = df,
type = "bar",
height = 400,
width =600
)
dIso$xAxis( grouporderRule = "iso" )
dIso$defaultColors(
#"#! d3.scale.category10() !#",
tol4qualitative,
replace = T
)
dIso
dIso <- dPlot(
y = "Freq",
x = "iso",
groups = "date",
data = df,
type = "line",
height = 400,
width =600
)
dIso$xAxis( orderRule = "iso" )
dIso$defaultColors(
#"#! d3.scale.category10() !#",
tol4qualitative,
replace = T
)
dIso
dIso <- dPlot(
y = "Freq",
x = c("date","iso"),
groups = "date",
data = df,
type = "area",
height = 400,
width =600
)
dIso$xAxis( grouporderRule = "iso" )
dIso$defaultColors(
#"#! d3.scale.category10() !#",
tol4qualitative,
replace = T
)
dIso
As you might already know, I love R, especially with rCharts and slidify.