Tuesday, September 9, 2014

Session 3 Homework

Hi all,

Session 3, spanning Questionnaire Design and [Exploratory] factor analysis, got done today.

Ran into time trouble and some teething Rstudio issues in Section A. Sorry about that.

About Shiny:

Will try to makeup in Section A by going over the shiny app for Factor analysis again in the opening minutes of Session 4.

Shiny allows you easy web-based cloud-located analysis capability. And though it has an R back-end, you wouldn't see any R code while using it.

Here is the link to the Factor analysis shiny app.

P.S. Some of you might find this old blog post useful (from 2 years ago) on how to interpret factor analysis output.

There are two homework assignments for session 3 - one group submission and the other individual .

Session 3 Homework 1:

Update: Am getting quite a few Qs asking if a scale other than Likert can be used etc. Sure, it can. Likert is important in the ocntext of behavioral constructs. For regular, descriptive Qs, use other scales by all means. *Not* every Q has to be a likert.

Read the following recent Businessweek article:

Coke's big fat problem.

Imagine you are in the shoes of Sandy Douglas. Now, do the following...

(i) From his 'messy reality', extract a relevant and pressing R.O. (stated clearly in words).

(ii) Map that R.O. onto 'information requirements' (see session 2 slides) that are built around some critical constructs of interest. Give these constructs a descriptive name.

In real life, we'd use exploratory/qualitative work extensively at this stage. Assume you have done so already.

(iii) Now, further break down the construct(s) you identified above into one-dimensional aspects that can be captured using Likerts.

(iv) Define your target audience/ target segment as teenagers. Develop a questionnaire for this target audience that can be taken in under 12 minutes.

Use of SKIP logic and any other Qualtrics features is welcome.

(v) Program your questionnaire into a websurvey into Qualtrics. The survey URL (obtained upon launching) is the deliverable and should be pasted along with your group name in this google form.

(vi) The first page of your survey should be descriptive text only, meant for me and the AAs. Pls write cogently the answers to parts (i) to (iv) above in that space.

Session 3 Homework 2:

Pls find in LMS, in the folder titled 'session 3 materials', code and data for your VALS survey.

Student names have been anonymized using random strings in both the class work and homework data files.

Pls replicate class work results for the Big5 dataset either using Rstudio (line by line) or using shiny.

Complete your HW analysis on the VALS dataset, similarly.

Your HW deliverable will be a 4-slide PPT by the name of your_name_here.pptx

In the first slide, pls write your name, PGID and MKTR section.

In the second slide, answer the following Qs

Q1. What is the size of the data matrix?

Q2. What is the optimal number of factors based on the screeplot?

Q3. How much of cumulative variance in the data matrix is explained by the factor solution?

Q4. Which variables have the highest and lowest uniqueness?

In the third slide, answer the following question (preferably in tabular form).

Q5. Interpret the factors. List the variables loading onto each factor. Give each factor a descriptive name.

In the last slide, answer the following Qs.

Q6. Look at the factor scores output. List the anonymized name strings of the respondents who have the highest and lowest scores on each factor.

Q7. If you are marketing lifestyle products - say adventure tours or rock-climbing gear - who in this sample would you target? List the top 5 names.

Deadline: For both HWs, the deadline is the midnight before session 5 begins.

Any queries, doubts etc, contact me or use the comments section below.

Sudhir

12 comments:

  1. code giving an error. my computer hangs after i input the csv file.

    ReplyDelete
  2. What is the error message? which line of code? Could you paste the line here. Thanks.

    P.S. IMO, the code was recently updated.

    Sudhir

    ReplyDelete
  3. --- Please select a CRAN mirror for use in this session ---

    ReplyDelete
  4. i think there was a problem with the windows firewall. Solved it.
    For people facing similar problem.
    Go to setting and disable windows firewall and then install the package. The package will be downloaded first for the internet (will take around 10 mins) after that it should work. (no need to install it everytime.

    ReplyDelete
    Replies
    1. OK. Thanks for posting it here rather than over email.

      Sudhir

      Delete
  5. I have a Doubt Sir,
    Using R-studio, I'm directly able to get the optimal number of factors required to explain the data, but while using Shiny, I'm being asked to input the number of factors upfront and it is then calculating the metrics. So, How do I decide on how many factors to take in shiny, Sir?

    ReplyDelete
    Replies
    1. Hi Chaitanya,

      shiny is merely a wrapper for base R code. So whatever shiny does is only because R allows it. You can always overrule the machine's recommendations and insert your own pre-set number of factors. This happens in the
      fit <- factanal (data, factor num,..)
      step I think. Use whatever number you think is appropriate in place of factor num.

      But usually, since the machine's reco is based on objective criteria (Eigenvals >1, for instance), its not a bad idea to follow what it says.

      Hope that helps.

      Sudhir

      Delete
  6. Dear Professor,

    Suppose I have a situation where 6 variables (questions) are loading onto one factor (only). 4 of these variables fall under one trait, say "hardworking" and 2 fall under another trait, say "humorous". We know (or believe based on experience) that these two traits are not generally correlated. Should I still consider them under one factor? Of course, data doesn't lie. But the only thing I can confidently say is that these two traits seem to be correlated among the survey sample. But based on our experience/intuition, we feel this may not be true for the population. What do we do in such situations?

    Thank you,
    Nikhil

    ReplyDelete
    Replies
    1. Hi Nikhil,

      1. One option is to see what common construct might underlie two seemingly uncommon sets of traits. Its gonna be hard but thats where opportyunity and rewards lie, to ID interesting and potentially profitable constructs which rival firms haven't thought of yet. After all 'No venture, no gain'.

      That said, if its too difficult or ridiculuous a thing to do, then over-rule, use your subjective jedgment and be prepared to defend it.

      2. Hard to say that something that is on average true of the sample is not true of the population. Would imply that your sample's nonrepresentative. On campus, what we do is retro-fit. we select our target popln such that it on average matches the sample rather than vice-versa like in the real world.

      Hope that helps.

      Sudhir

      Delete
  7. Professor, I am getting an error while trying to run the ratings.csv file. The other two .csv files related to survey are running fine

    ReplyDelete
  8. This is the error message:
    Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
    duplicate 'row.names' are not allowed
    In addition: Warning message:
    non-unique values when setting 'row.names': ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’

    ReplyDelete
  9. Well, its saying row names are duplicates. Can you drop by with the issue, can see what's happening, first. Else, just comment out the row names wala line and proceed.

    ReplyDelete

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.