getting it wrong with R

the journal of Michael Werneburg

Toronto, 2017.07.23

I'm taking a "MOOC" on Coursera in data science. There's an R programming element to it, and I'm currently taking that - the second - class.

Today I spent a few hours doing a twenty minute assignment because I mis-read it. But if anyone's interested in a system by which you can fairly quickly read a raft of (similarly formatted) CSV files into one matrix, here's a way of doing so.


library(plyr)
corr <- function(directory, threshold = 0) {
 # 'directory' is a name of a valid subdirectory
 # 'threshold' is an optional cut-off for retention
 # of the records in any file
 # step zero, set up a matrix with the two critical
 # fields from the files
 dat = matrix(data=NA,nrow=0,ncol=2, byrow=TRUE)
 colnames(dat) <- c("sulfate", "nitrate")
 list <- list.files(directory, all.files=TRUE, full.names=TRUE, recursive = TRUE)
 for (filename in list) {
 if (grepl(".csv", filename) == FALSE) {
 next
 }
# e.g. poldata <- read.csv(file="specdata/002.csv", header=TRUE, sep=",", as.is=T)
 poldata <- read.csv(file=filename, header=TRUE, sep=",", as.is=T)
 # removes any incomplete records
 poldata <- poldata[complete.cases(poldata),]
 # get a count of good records in the file
 rowsGood <- nrow(poldata)
 if (rowsGood >= threshold) {
 # this was by far the fastest route I could find
 # 1. cast the just-loaded data.frame as a matrix
 matrix <- as.matrix(poldata[c("sulfate","nitrate")])
 # 2. bulk-copy the records (using plyr library)
 dat <- rbind.fill.matrix(dat,matrix)
 }
 }
 cor(data.frame(dat[,1], dat[,2]))
 }

Again, this is not the assignment from the Coursera course, this is something more difficult. I misread it while in the middle of one of my damn headaches because I was working against a deadline. I probably would have been better served by resting for that time, then reading the assignment correctly.