Pls see this Session 4 HW guidance blog-post before submitting your HW.
I have re-formatted parts of this blog-post. Inline images have been added for each tool's graphical output. Pls let me know if you're having any trouble viewing the images. Use the comments box below for fastest response.
Hi all,
I hope you had the chance to run the R demo of session 3 at home just to see first-hand R's ease-of-use in MKTR.
Before jumping into the R code for Session 3, first, a small admin step - you need to know how to download and install packages in R. Say, you need the package 'MASS' installed. Go to the Packages Menu in the Menu bar, click on 'Install Package(s)...', a window will open asking which server to download the package from. Be patriotic and choose 'India'.
A second window will open listing all the packages in R (at present, this list grows every month) in alphabetical order. Click on thepackage you want and sit back. R will automatically download and install the package for you. Might take a minute or two at most.
All the data for the below examples are stored as tables in this google spreadsheet.
OK, let's get started with the R codes then.
1. Simple Data Visualization using biplots. We use USArrests data (inbuilt R dataset) to see how it can be visualized in 2 dimensions. Just copy-paste the code below onto the R console [Hit 'enter' after the last line].
pc.cr <-princomp(USArrests, cor = TRUE)# scale data summary(pc.cr) biplot(pc.cr) |
2. Code for making Joint Space maps I use as my example the OfficeStar dataset that I also demo in class from MEXL's built-in examples database. The code is generic and can be applied to any dataset in the right format you want to make joint space maps from. To facilitate comparison, I use as input format in R the same tables that you would otherwise use in MEXL
For data input, read in only the cells with numbers, no headers. The headers will need to be read in separately.
Step 2a: Read in the attribute table into 'mydata'. Only number cells, no headers.
# Read in the 5x4 mean attributes table # mydata = read.table(file.choose()) mydata = t(mydata)#transposing to ease analysis mydata #view the table read |
Step 2b: Read the preferences table into 'pref'. Only the number cells, no headers.
# Read in preferences table# pref=read.table(file.choose()) dim(pref) #check table dimensions pref[1:10,]#view first 10 rows |
Step 2c: Now read in table headers for 'mydata'. The below code is specific to the current example. If you're working on another dataset, pls edit the headers below (on a notepad) and then copy-paste into R.
# Read in headers separately # attribnames = c("Large choice","Low prices","Service quality","Product Quality","Convenience") brdnames = c("Office star", "Paper & co.","Office Eqpmt", "supermkt") rownames(mydata) = brdnames colnames(mydata) = attribnames mydata # view table with headers |
Data reading is done. Can start analysis now. Finally.
Step 2d: Plot the Joint space maps. The following code is general and can be used directly for any joint space mapping you may want to do once you have correctly read-in the data.
par(pty="s")#square plotting region fit = prcomp(mydata, scale.=TRUE) plot(fit$rotation[,1:2], type="n",xlim=c(-1.5,1.5), ylim=c(-1.5,1.5),main="Joint Space map - Home-brew on R") abline(h=0); abline(v=0) for (i1 in 1:nrow(fit$rotation)){ arrows(0,0, x1=fit$rotation[i1,1]*fit$sdev[1],y1=fit$rotation[i1,2]*fit$sdev[2], col="blue", lwd=1.5); text(x=fit$rotation[i1,1]*fit$sdev[1],y=fit$rotation[i1,2]*fit$sdev[2], labels=attribnames[i1],col="blue", cex=1.1)} # make co-ords within (-1,1) frame # fit1=fit fit1$x[,1]=fit$x[,1]/apply(abs(fit$x),2,sum)[1] fit1$x[,2]=fit$x[,2]/apply(abs(fit$x),2,sum)[2] points(x=fit1$x[,1], y=fit1$x[,2], pch=19, col="red") text(x=fit1$x[,1], y=fit1$x[,2], labels=brdnames,col="black", cex=1.1) # --- add preferences to map ---# k1 = 2; #scale-down factor pref=data.matrix(pref)# make data compatible pref1 = pref %*% fit1$x[,1:2] for (i1 in 1:nrow(pref1)){segments(0,0, x1=pref1[i1,1]/k1,y1=pref1[i1,2]/k1, col="maroon2", lwd=1.25)} # voila, we're done! # |
3. Code for Joint Space maps on your Session 2 Homework dataset (term 4 courses).
For data input, read in only the cells with numbers, no headers. The headers will need to be read in separately. Following steps mirror steps 2a, 2b and 2c above.
Step 3a: Read in the attribute table into 'mydata'. Only number cells, no headers.
# Read in the 4x4 mean attributes table # mydata = read.table(file.choose()) mydata = t(mydata)#transposing to ease analysis mydata #view the table read |
Step 3b: Read the preferences table into 'pref'. Only the number cells, no headers.
# Read in preferences table# pref=read.table(file.choose()) dim(pref) #check table dimensions pref[1:10,]#view first 10 rows |
Step 3c: Now read in table headers for 'mydata'. The below code is specific to the current example. If you're working on another dataset, pls edit the headers below (on a notepad) and then copy-paste into R.
# Read in headers separately # attribnames = c("Conceptual & theoretical value","Practical relevance","Interest sustained","Difficulty level") brdnames = c("GSB", "INVA","MGTO", "SAIT") rownames(mydata) = brdnames colnames(mydata) = attribnames mydata # view table with headers |
The rest of the analysis uses exactly the same block of code as in step 2d after the data input part.
4. MDS code
MDS or Multi-dimensional scaling is the way we analyze overall-similarity (OS) data. Recall that you rated your impression of overall similarity among 9 car brands in a series of paired comparisons. We use that data for MDS here. The input is required in a particular format and there's some data cleaning and aggregation required. Luckily, my RA Shri Ankit Anand was around and a big help. Pls find the data for MDS input in the above google spreadsheet starting at cell P1.
First, we do metric MDS. Metric MDS uses metric data as its input. Since you rated similarity-dissimilarity on a 1-7 interval scale, metric MDS will work just fine. Here's the code for metric MDS, just copy-paste onto the R console.
Read in data (with the headers).
# read in the dissimilarities matrix # mydata=read.table(file.choose(),header=TRUE) d = as.dist(mydata) # Classical MDS into k dimensions# fit <- cmdscale(d,eig=TRUE, k=2) fit # view results # plot solution # x <- fit$points[,1] y <- fit$points[,2] plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", main="Metric MDS", type="p",pch=19, col="red") text(x, y, labels = rownames(fit$points), cex=1.1, pos=1) abline(h=0); abline(v=0) |
Suppose the similarity-dissimilarity judgments were in yes/no terms rather than in metric ratings. Then metric MDS becomes dicey to use as it relies on interval sclaing assumptions. The more robust (but somewhat less efficient) nonmetric MDS then becomes the way to go.
Nonmetric MDS is just as easy to run. However, make sure you have the MASS package installed before running it.
library(MASS) d <- as.dist(mydata) fit <- isoMDS(d, k=2) fit # view results # plot solution $ x <- fit$points[,1] y <- fit$points[,2] plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", main="Nonmetric MDS", type="p", pch=19, col="red") text(x, y, labels = rownames(fit$points), cex=1.1, pos=1) abline(h=0); abline(v=0) |
5. Factor analysis code
We will use exploratory (or 'common') factor analysis first on the toothpaste survey dataset. This dataset can be found starting cell P22 in the google spreadsheet mentioned above. Need to install package 'nFactors' (R is case sensitive, always) for running the scree plot.
# read in the data # mydata=read.table(file.choose(),header=TRUE) mydata[1:5,]#view first 5 rows # determine optimal no. of factors# library(nFactors) ev <- eigen(cor(mydata)) # get eigenvalues ap <- parallel(subject=nrow(mydata),var=ncol(mydata),rep=100,cent=.05) nS <- nScree(ev$values, ap$eigen$qevpea) plotnScree(nS) |
k1=2 # set optimal no. of factors |
# extracting k1 factors # # with varimax rotation # fit <- factanal(mydata, k1, scores="Bartlett",rotation="varimax") print(fit, digits=2, cutoff=.3, sort=TRUE) # plot factor 1 by factor 2 # load <- fit$loadings[,1:2] par(col="black")#black lines in plots plot(load,type="p",pch=19,col="red") # set up plot abline(h=0);abline(v=0)#draw axes text(load,labels=names(mydata),cex=1,pos=1) # view & save factor scores # fit$scores[1:4,]#view factor scores #save factor scores (if needed) write.table(fit$scores, file.choose()) |
In case you are wondering how the variable loading onto factors looks like in R (after factor rotation), here is the relevant R console snapshot.
Clearly, the top 3 variables load onto factor 1 and the bottom 3 onto factor 2. That is all for now. See you soon.Sudhir
P.S.
This survey based exercise we had done in class. I've sent you its data in an excel file. For practice sake, pls run it through a joint space map and see if you get the results I did (pasted below).
This one is using Female students' inputs
And this one is using inputs from the male students:
Had a question.
ReplyDeleteIn the slide "The Perceptual Map from HW Data" from Session 4, in order to understand the relative perceptions of GSB and SAIT along the "Conceptual and Theoretical Value gained" dimension, can we draw perpendiculars from GSB and SAIT to the extension of the line "Conceptual and Theoretical value gained" to the other side of the Origin?
Hi Samik,
ReplyDeleteYes. Each dimension extends to both sides of the origin. In the direction of the arrow, it increases and in the opposite direction behind the origin it decreases.
The origin can also be considered a standardized "mean" value and brands behind it are "below the mean".
Hope that helped.
Sudhir
Thanks.
ReplyDeleteI am looking at the code for the Factor analysis and had a question -
ReplyDelete"On the scree plot that appears, THe green horizontal line represents the Eigenvalue=1 level. Simply count how many green triangles (in the figure above) lie before the black line cuts the green line. That is the optimal no. of factors. Here, it is 2.
k1=2 # set optimal no. of factors
# extracting k1 factors #
# with varimax rotation #
fit <- factanal(mydata, k1, scores="Bartlett",rotation="varimax")
print(fit, digits=2, cutoff=.3, sort=TRUE)
# plot factor 1 by factor 2 #
load <- fit$loadings[,1:2]
par(col="black")#black lines in plots
plot(load,type="p",pch=19,col="red") # set up plot
abline(h=0);abline(v=0)#draw axes
text(load,labels=names(mydata),cex=1,pos=1)
# view & save factor scores #
fit$scores[1:4,]#view factor scores
#save factor scores (if needed)
write.table(fit$scores, file.choose())"
In this code piece I realise I need to change the value of the no. of factors based on the graph I get. But do I need to make adjustments for Eigenvalue also and how? Also should I make changes to the value of cut off?
Thanks
Hi Shakun,
ReplyDeleteJust change the value of k1 in:
k1=2 # set optimal no. of factors
Change it to
k1=6 # set optimal no. of factors
Rest of the code remains unchanged.
Let me know if there're any more queries.
Sudhir
Dear Professor,
ReplyDeleteIn Factor analysis code, is there any specific reason why you chose to read 5 rows? What happens if we take all the rows? Is there any impact of rows on the Scree Chart?
# read in the data #
mydata=read.table(file.choose(),header=TRUE)
mydata[1:5,]#view first 5 rows
Thanks!
Best Regards
Abhishek
Hi Abhishek,
ReplyDeleteThe line:
mydata[1:5,]#view first 5 rows
only 'views' the first 5 rows. The entire data is in R and is used as is for analysis purposes. But for convenience, just to see what the data look like, its good practice to have R display a few lines of each dataset every now and then.
Hope that clarifies.
Sudhir