Showing posts with label HW. Show all posts
Showing posts with label HW. Show all posts

Tuesday, December 18, 2012

Session 6 HW and other Notes

Article Update:
Found this really neat article:Different ways in which you can become a data scientist. Given that Data sciences are going to become an element of competitive advantage in the medium term, I'd say that article is of interest to all MKTR students (regardless of background).

**********************************

Update: Here's an interesting article on job trends in the coming year. 6 startup trends in 2013: bootstrapping, marketing, B2B. Thought it relevant to highlight this part of the article for our MKTR course:

5) Marketing becomes as hot as tech.By 2017, CMOs will be spending more on IT than CIOs. Driving this massive shift is the customer data that simply did not exist a decade ago.” -Ajay Agarwal, Bain Capital Ventures

6) Service marketplaces — not individual suppliers — will become the “brand.” Just as Amazon has become a leading brand for books (versus individual publishers), consumers will look to branded marketplaces for various services, such as teaching, cleaning, or construction. -Eric Chin, general partner, Crosslink

Capital

Point? What looks arcane and abstract right now (e.g., decision trees or targeting algorithms we used) is the future. **********************************

Hi all,

Your Session 6 HW (on some targeting algorithms we covered in class) requires you to go through the short caselet 'Kirin' (PDF uploaded). This HW has 4 Qs as detailed below and will use a PPT submission format (the same as for the session 5 HW).The submission deadline is 26-Dec Wednesday midnight into a dropbox.

Some background to the Qs first. Before targeting we need to do segmentation. Q2 deals with segmentation and the interpretation of segments. However, prior to segmentation, we need to know what constructs may underlie the basis variables. Q1 deals with this aspect. You did factor analysis and segmentation already in Session 5 HW. It occurs again in this HW in Q1 and Q2 but with less emphasis. The focus is more on Q3 and Q4 where we apply the randomForest and neural net algorithms respectively.

This HW also demonstrates the importance of selecting good discriminant variables. As it turns out, the discriminant variables used here are lousy and yield remarkably low predictive accuracy rates even with such sophisticated algos. The takeaway? Methods cannot alleviate deficiencies in the data beyond a point. OK, without further ado, here we go:

Questions for Session 6 HW:

  • Q1.Find what constructs may underlie the basis variables. Use factor analysis, report eigenvalue scree plot & factor loadings table. Answer the following Q sub-parts:
  • (i) Which variables load less than 30% onto the factor solution? (Hint: Look for Uniqueness threshold of 1-0.30 = 0.70 or above)
  • (ii) ID and label the constructs you find among the variables that do load well onto the factor solution.
  • Q2. Use mclust to segment the respondents. Answer the following Q parts.
  • (i) What is the optimal no. of clusters?
  • (ii) Report a clusplot. What is the % of variance explained by the top 2 principal components in the cluster solution?
  • (iii) ID and label the segments you find.
  • Q3. Split the kirin datasets into training sample (first 212 rows) and test sample (the remaining 105 rows). Train the randomForest algorithm on the training sample. Predict test sample's segment membership. Answer the following Qs:
  • (i) Record predictive accuracy in both training and test samples
  • (ii) Which segment appears to havwe the highest error rate?
  • (iii) List the top 3 variables that best discriminate among the segments (use Mean Decrease in Accuracy metric)
  • Q4. Use the split kirin datasets to try multinomial logit with neural nets on the training and test samples. Predict test sample's segment membership.
  • (i) Record predictive accuracy in both training and test samples
  • (ii) Which segment appears to havwe the highest error rate?
  • (iii) List the top 3 variables that best discriminate among the segments (use significance as a metric here)

Some Notes on why use R:

In case you are wondering why you are being asked to bother with R, some points to note:

  • Its imperative you learn how to run MKTR analyses at least once. The reason is you won't be able to effectively lead a team of people who do what you've never done. Sure, your analytics team will run the analysis but you need to have an idea of what that entails, what to expect etc.
  • Folks who have any programming experience at school or at work can probably vouch for the fact that the R language is about as straightforward as, well, English. Pls implement the R code *yourself* to get a feel of the platform. Pls help your peers out who may not be as comfortable with R.
  • Those who are ambivalent or undecided about using R, I say pls give it a sincere try. Its a worthwhile investment. Learning in this course is best leveraged with a basic understanding of what a versatile analytics platform does.
  • Pls share problems in the code or the data that you found, workarounds that you figured out, other packages you discovered that do the same thing better/faster etc on the blog as well. R is a community based platform and it draws its strength from a distributed user and developer base.
  • Those who are determined to not touch R are free to do so. No harm, no foul. Pls borrow the plots and tables from your peers, but interpret them yourself.
Dassit for now. Gotta get to work on tomorrow's session slides and associated blog-code.

Sudhir

Monday, December 17, 2012

Session 5 HW

Hi all,

Your 'Session 5 HW' is out, in a folder of the same name on LMS. The R code I used to test the HW is put up as a notepad. Feel free to use blocks of that R code directly for your HW.

Important: Pls read the short caselet in a PDF file 'Conglomerate PDA' in the HW folder *before* attempting the exercise. As an instructor, I assure you that if you try interpreting the analyses without reading the caselet, it will show.

Recommended:

  • Pls try the classwork examples on R before trying the HW examples. The classwork blog-post has explanations for each block of code.
  • Ensure you have the required packages loaded before you start. These are (IIRC): nFactors, cluster, mclust and rpart.

HW Questions:

  • Q1. Are there any 'constructs' underlying the basis variables used for segmentation in the PDA caselet? What might they be? Give a name to and interpret these factors/ constructs.
  • Q2. Segment respondent data using hclust, mclust and k-means (with scree-plot). Record how many segments you find. Draw clusplots for k-means and mclust outputs.
  • Q3. Characterize or 'profile' the clusters obtained from mclust. Name the segments (similar to the China-digital consumer segments reading)
  • Q4. Read-in discriminant data for the PDA case. Make a dataset consisting of the interval scaled discriminant variables only. Now plot a decision tree to see which variables best explain the membership to the largest segment (segment 1). List the variables by order of importance.
Submission format is a set of PPT slides:
  • Ensure your name and PGID are written on the title slide.
  • Ensure all your plots (and the important tables) are pasted as images on the slides. Typically metafile images are best.
  • Pls give each slide an informative title and mention question number on it.
  • Pls be aware that while you are free to consult peers on the R part of the plots making, interpretation and writing up is solely an individual activity.
  • Submission deadline is Monday Midnight 24-Dec-2012 in a LMS dropbox.
Session 6 HW is in the making. Will happen shortly and its deadline will be close to this one's. FYI.

Any Qs on this HW, pls contact me directly. Pls use the blog comments pages to reach me fastest. Feedback on any aspect of the HW or the course is most welcome.

Sudhir

Saturday, December 15, 2012

Session 4 HW Help

Hi all,

Thanks to the tireless efforts of one Mr Shaurya Singh, I've put up what is (almost) the solution to your Session 4 HW on LMS.

The data and code required to run all your session 4 HW exercises (due wednesday 19-Dec midnight)are up on a folder imaginatively named 'session 4 HW' on LMS.

As you know, session 4 HW has three parts to it. The data and code for each part are placed in separate subfolders.

Instead of copy-pasting code from the blog, I urge you to copy-paste bvlocks of code directly from the R code.txt files in each of the subfolders. The data required to be read-in are available directly as .txt files in the same sub-folders.

I now estimate no more than 30 minutes for you to run through the entire R portion of your session 4 HW. Pls ensure you have a good interpretation (or 'story') ready to be written as bullet points onto your HW slides.

My colleagues tell me there is hope the students may come back to normal with Day 1 of placements over, for the rest of the term. Well, we'll see. Cheers

Sudhir