I noticed that if you make a question with multiple responses then that responder's response to that question will be a concatenation of responses. Not very useful for data analysis.

Suppose two items on the survey ask to select which Apple & Microsoft products the responder has used in the past 6 months. When you import the responses into R using that importer code you might see responses 1. “iPad, iPhone” and 2. “iPod Touch, iPhone, iMac” in the Apple column, and 1. “Xbox, Surface” and 2. “Zune” in the Microsoft column.

So we run

x <- separate(survey.data, vars = c("Apple", "Microsoft"))


Which would output a list with two components, each of which is a data frame with indicator variable for each possible response (using the first 5 characters). If we combine these two components into one data frame using cbind we might see:

Apple.iPad Apple.iPod Apple.iPhone Apple.iMac Apple.MacBo Microsoft.Xbox Microsoft.Surfa Microsoft.Zune
1 0 1 0 0 1 1 0
0 1 1 1 0 0 0 1

The code for the separator function is:

separate <- function(x, vars) {
# x : data frame vars : vector of column names
temp <- list()
for (i in 1:length(vars)) {
temp[[i]] <- sapply(x[, vars[i]], function(y) {
strsplit(as.character(y), ", ")
})
}
lvls <- lapply(temp, function(y) {
unname(sapply(substr(unique(unlist(y)), 1, 5), function(z) {
if (is.na(z))
"NA" else z
}))
})
n.lvls <- sapply(lvls, length)
VARS <- list()
for (i in 1:length(vars)) {
OBS <- as.data.frame(matrix(0, nrow = length(temp[[i]]), ncol = n.lvls[i]))
names(OBS) <- lvls[[i]]
for (j in seq(along = temp[[i]])) {
if (!(is.na(temp[[i]][[j]])[1])) {
for (k in seq(along = temp[[i]][[j]])) {
OBS[j, substr(temp[[i]][[j]][k], 1, 5)] <- 1
}
} else {
OBS[j, ] <- rep(NA, n.lvls[[i]])
}
}
which.na <- which(lvls[[i]] == "NA")
names(OBS) <- paste(vars[[i]], lvls[[i]], sep = ".")
VARS[[i]] <- OBS[, -which.na]
}
names(VARS) <- vars
return(VARS)
}


I admit that the approach I've taken above is inefficient but it works.

NOTE: If one of the possible responses contains commas then multiple columns will be created for that response. So, for example, if the responder can check “Online (Amazon, eBay)” along with “In-store (Best Buy, Frys)” then we will see the following columns (which have the same number of 1s and 0s): Onlin, eBay, In-st, Frys. This is because the function uses “, ” to separate a response into multiple possible responses. This is unavoidable so be careful.