Your 'Session 5 HW' is out, in a folder of the same name on LMS. The R code I used to test the HW is put up as a notepad. Feel free to use blocks of that R code directly for your HW.
Important: Pls read the short caselet in a PDF file 'Conglomerate PDA' in the HW folder *before* attempting the exercise. As an instructor, I assure you that if you try interpreting the analyses without reading the caselet, it will show.
Recommended:
- Pls try the classwork examples on R before trying the HW examples. The classwork blog-post has explanations for each block of code.
- Ensure you have the required packages loaded before you start. These are (IIRC): nFactors, cluster, mclust and rpart.
HW Questions:
- Q1. Are there any 'constructs' underlying the basis variables used for segmentation in the PDA caselet? What might they be? Give a name to and interpret these factors/ constructs.
- Q2. Segment respondent data using hclust, mclust and k-means (with scree-plot). Record how many segments you find. Draw clusplots for k-means and mclust outputs.
- Q3. Characterize or 'profile' the clusters obtained from mclust. Name the segments (similar to the China-digital consumer segments reading)
- Q4. Read-in discriminant data for the PDA case. Make a dataset consisting of the interval scaled discriminant variables only. Now plot a decision tree to see which variables best explain the membership to the largest segment (segment 1). List the variables by order of importance.
- Ensure your name and PGID are written on the title slide.
- Ensure all your plots (and the important tables) are pasted as images on the slides. Typically metafile images are best.
- Pls give each slide an informative title and mention question number on it.
- Pls be aware that while you are free to consult peers on the R part of the plots making, interpretation and writing up is solely an individual activity.
- Submission deadline is Monday Midnight 24-Dec-2012 in a LMS dropbox.
Any Qs on this HW, pls contact me directly. Pls use the blog comments pages to reach me fastest. Feedback on any aspect of the HW or the course is most welcome.
Sudhir
Prof - Saw your comments from http://marketing-yogi.blogspot.in/2012/11/interpreting-factor-analysis-results.html.
ReplyDeleteWhy is there overlap in the clusters. Is it because we have projected the 4 dimensional space on to 2 principal components. If yes, how do I make sense of the clustering that happened.
Thanks
Hi Anon,
DeleteYes. The clusters do not overlap in 4 dimensions, but seem like they do in 2 dimensions. The clusplot is merely to visually see in 2D what we are getting, nothing more.
Hope that clarifies.
Sudhir
Yes, it does thanks
Delete"similar to China-digital consumer segments"
ReplyDeleteWhere is this?
Thanks
Look at session 5's readings handout.
DeleteSudhir
Dear Professor,
ReplyDeleteYou said "Now plot a decision tree to see which variables best explain the membership to the largest segment (segment 1)"
Here, are we making an assumption that we will target the largest segment. If yes, is this the right strategy? For e.g. My largest segment might be highly price conscious, in which case I would like to target a smaller less price sensitive segment. Thanks
True.
DeleteThis is just to demo the use of decision trees in a targeting context.
There are 4 segments in the exercise. We can use multivariate decision trees for this problem (refer classwork example). But I have simplified things by asking only for a binary Y variable - whether belongs to segment 1 or not.
Sudhir
Dear Professor
ReplyDeleteEvery time one runs the code for plotting K means clusters a different clustering is generated. Is that expected outcome? (ref Q.2)
Also in Q4. when you plot the decision tree, what does Number1/Number2 signifies in each box?
what is the difference between a rectangle box and an oval?
Thanks
Hi Sayan,
DeleteRe k-means scree plots: Yes. K-means is dependent on starting values given. Usually it converges to the solution regardless of starting value but sometimes different starting values can get stuck at local optima. Because each time R is choosing different set of random starting values, you are getting different scree plots.
Re decision trees: Ovals are intermediate (decision) nodes whereas rect boxes are terminal nodes. the 102/58 in the top oval says there are 102 non-As and 58 As. For a clearer understanding of the decision tree, I recommend going line by line over the output R gives to 'summary(fit)'.
Note that the terminal nodes all give heavily skewed results in favor of one or the other segment.
Hope that clarifies.
Again, I wish more folks had taken the time to attend the tutorial and discuss these Qs.
Sudhir