Showing posts with label Homeworks. Show all posts
Showing posts with label Homeworks. Show all posts

Friday, November 18, 2011

Homework and Exam related Q&A

Post-Exam Update:
I later noticed, after the exam had started that there was a typo in the binary logit question. The coeff of income^2 was shown as -.088 instead of -0.88. Hence, the predicted probabilities of channel watched for all 3 cases (min, max and mean profile of respondents) would now come up as 1. Folks who show the expression and calculations will get full credit for this problem.

A related Q regarding the DTH caselet. There are many possible assumptions you could make and from each, a different research design might flow. As long as you've stated clearly your assumptions and the research design that follows is logically consistent with your assumptions, you are OK.

Hope that clarifies.

Sudhir

Update:

Another thing I might as well clarify: - regarding the quant portion - only interpretation of a model and its associated tables will be asked for.

Even there, only those tables that we explicitly have discussed in class, not merely shown but discussed, will be important from an exam viewpoint.

Sudhir

Am getting quite a few Qs regarding this. A wrote in just now:
Hi Chandana,

Need a quick clarification. For this home work questions 1,2,3 and 4 are mandatory and 5,6,7 are optional. Is that right ?
Regards,
A
My response:
Sudhir
----------------------------------------------------------------
Got this just now:
Professor,

When I mailed you today morning, I had high aspirations of finishing my studies and then meeting you for a quick review. Unfortunately, I just finished going through Session 4 hand out. My sincere apologies but I guess I will have to cancel this appointment…and give the slot to a better prepared student…!!

Thanks,
K
My response:
That's OK. Drop by anyway. I expected I'd be busy during these office hrs but am (pleasantly) surprised. Seems folks have on average understood the material well and don't need additional office hrs now.

Besides, anyway, the exam is not designed to be troublesome. I wouldn't worry overmuch if I were you.

Sudhir
----------------------------------------------------------------


Hi all,

Was asked yesterday about this and might as well share with the whole class.

Q was that in the factor analysis homework, what to do if the factor solution with eigenvals>1 is still showing cumulative variance explained of <60%?

Well, if the % is in the mid to late 50s, just go with it.

If not, it would seem like the factor solution is not doing a great job of explaining variance in the input variables. This is presumably because at least some variables are not well correlated with the others and are hence weakening the factor solution.

To ID such variables, look either at the correlation tables or at the communlaities table. The variables that show least correlation with others should progressively be removed and factor anbalysis re-conducted at each step till the 60% criterion is met.

If a variable loads entirely on its own factor, drop that factor and use that variable as-is in downstream analysis.

Hope that clarifies. Any more such Qs and I shall share it here.

Sudhir
Hi A,
Yes, only Q1-4 are mandatory. Rest are optional in session 9 HW.

Wednesday, November 9, 2011

Session 8 Homework Q&A

Update: Received this email over the weekend.
Dear sir

Actually, I think the issue is not with column V7 (V7 has both 1's and 0's), but rather with column V19, which is all "." (dots) when added to SPSS.
On eliminating V19 and adding V7 back in, the factor reduction is running smooth.
R

Update: Have just sent in my solution to session 7 homeworks to teh AAs. Should be up on LMS soon.

Hi all,

Well, well.. an early bird decided to take on Session 8 HW aaj hi....

Here's an email I got from V:
Dear Prof.,


When trying to do the “factor analysis” on the “hw1 ice-cream dataset” I am encountering the following issue –

The data type is “nominal” (0,1) and when I run a factor analysis on SPSS, it throws up the following error message –

Warnings

There are fewer than two cases, at least one of the variables has zero variance, there is only one variable in the analysis, or correlation coefficients could not be computed for all pairs of variables. No further statistics will be computed.
Could you please help with what I might be doing wrong.

Thanks!

Regards,
V
My response:
Aha.


Check up the descriptive stats. See if all the variables are indeed 'variables'. If I recall correctly, one of the questions was such that *everybody* answered the same way.

So the std deviation in that column was zero. Will need to eliminate that one first.

Sudhir
And then his response:
Thanks professor!


It seems to be working after I eliminated V7.

Regards,
V
All's well that ends well I guess.

Sudhir

Tuesday, November 8, 2011

Session 7 Homework issues

Got these few emails:

Dear Prof/TA,
I could see two different Homework Part II for Session 7. One in page 11 of the handout that was given to us.
The other is in the slides in Addendum slides for session 7 called Homework Part II ( Optional )
My question is, which one is actually Homework Part II for session 7 and is it optional ?  Sorry if I misunderstood instructions in the class.
Thanks
H

My response:

Yes, the slide deck contains the optional part. Feel free to ignore it.
There are two questions in session 7 Hw - one related to standard regressions (mobile usage example) and one related to multinomial logit in worksheet 'hw4 MNL'.
Hope that clarifies.


Then this one:

Professor,
I am unclear on Part-II of hw 7. We are asked on predicting the edulevel using the means and modes of the relevant model. So we have the “us” or the numerator part of the logit model, but not the “them” part.
To analyze the “them” part, we have 3 levels in rateplan, 3 levels in gender and familysize.
·         Is taking family size as the mean vale for the denominator appropriate?
·         If so, this will give a total of 9 combinations (or parts) for the denominator.
·         We then look at the probability of education level 1 and 2 and whichever is more probable is our answer
Is this approach correct?
Sincerely,
R

My response:

Hi R,
Pls look at the addendum for session 7 in which is put up on LMS on the logit based prediction model.
There, given a set of X values X={10,9,1} for {sales, clientrating, coupon1}, we predicted the probability of instorepromo being 1,2 or 3.
Similalrly, once you have a set of Xs for {edu, famsize, rateplan, etc.}, use the SAME X profile in both the numerator and the denominator of the logit expression.
Hope that clarifies.
Another:
Hi Chandana, Can you let me know where the worksheets for the homework are? I’m unable to find them on LMS. Session 7 slide 29 says  worksheet labeled ex2Session 7 slide 44 says worksheet name ‘hw 1 MNL’ I can’t find either. Thanks,RC

My response:
'ex2' is the standard regressions based homework - mobile usage one. 'hw4 MNL' is the logit based homework. It was wrongly written as 'hw1 MNL'. MNL is Multi Nomial Logit in the sheet name. I hope that clarifies. 
Shall putup more Q&A homework related as they happen in this thread. More recent on top.

Sudhir

Sunday, November 6, 2011

Homework session 6 Issues

Update:
OK, IT tells me they've sent instructions already for SPSS trial version download. Great. Then this homework turns oput to be much easier than I had first imagined. Good. Some of the gyan on re-coding and transforming data for the T-test elaborated below would still hold, I guess.

Hi all,

1. Pls let me know if there are any queries etc you're facing w.r.t. session 6 homework. Shouldn;t take more than an hour odd, by my reckoning but if you've no clue on how to approach the questions, then it can seem quite daunting, I now realize.

2. I'll present my own solution to this homework in class in a few slides.

3. The most common -sensical approach, the way I see it for the first two questions is to take out the four concerned columns in a fresh worksheet (and have REspondent ID also to keep count), build pivots and run chisq.test() in R on the pivots obtained. In Excel, you'll need to also generate the expected distribution. This is done as (row total* column total)/(overall total) for each cell in the table. As a general rule, ignore blanks and non-standard responses in your cross-tabs.

4. For the t-test question, you'll need to re-code data into metric (interval) form. So use CTRL+H or 'Find and replace' function in Excel to transform the text responses obtained into a 1-5 scale (or a -2 to 2) scale or something. Sort the responses to weed out blanks and other such. Then run the simple TTEST() function in Excel.

5. The above is only 1 way of doing these things. It seemed to me to be a common-sensical approach and so I elaborated on it. You may reach the answers in a quicker, smarter way, perhaps. That is entirely fine too.

Hope that helps. Pls use the comments thread below for Q&A in case of any queries.

Sudhir

Thursday, November 3, 2011

Qualitative Homework Related

Update: This I got form Vimalakumar S
Dear Professor,


This is in response to the blog entry regarding Qualitative Homework and use of Software tools to count words.

I have two issues to discuss here: -

1. I find equating all open ended questions and their answers as qualitative inputs as erroneous. Just because the survey question did not restrict the responses to a few choices does not make the inputs qualitative. Another proof of this fact is that most of us who attempted the assignment were inclined to count the recurring words rather than read and understand what the consumer was saying. In my opinion, if you can reduce the data collected onto a frequency distribution chart it is no longer qualitative. May be the survey should have limited the answer choices based on an prior understanding of possible responses.


2. Related issue would the question of “How to analyse qualitative inputs?”. And in my opinion, counting words is not the answer. The researcher seeks qualitative questioning to understand the consumer in a manner better than what a simple quantitative question would convey. To analyse the inputs and take in all its richness we need a better tool than a frequency distribution chart of recurring words.

Regards,

Vimal
My take: Sure. Qualitative info for qualitative understanding has its own importance and we're yet to find a better way than expert analysis for that one. However, when open-endeds are used the way they are in surveys, some attempt at categorization (and thereby dimension reduction) is not a bad idea. Thanks, for writing in and sharing your thoughts.

Update: Session 6 related.
The quant portion has started. Expectedly there have been a few hiccups, esp in the absence of SPSS. Hopefully session 7 will fare smoother than this. I shall resend a slide deck as an addendum to session 6 slides with material that may have been missed in one section or the other. The slide deck will pointedly include R code, instructions and screen caps.

Hi All,

The session 5 homework relatring to some preliminary analysis of open-endeds qualitative data I take was quite challenging.

I would expect folks to take a straightforward common-sensical approach : Take a random sample of some, say, 100 odd responses and analyze them assuming they represent the rest of the responses for the purposes of the homework. Use of excel functions like FIND() etc. would be par for this course.

However, like always happens with homeworks, some have gone further, gone creative or gone haywire in coming up with new approaches to this problem.

Below I present what one student did from his email:

Hey Professor,


I used the following two tools to answer the Qualitative research homework and try and make sense of the mounds of data provided to us and figure out the top reasons :

1.) Word Frequency : http://writewords.org.uk/word_count.asp

2.) Phrase Frequency : http://writewords.org.uk/phrase_count.asp

Of course, I had to figure out the correct number of words to use in the phrase, but after that, it made it much easier to consume the raw data of the survey. There are many more tools which also do this, but this was the first result from Google :).

I was initially planning to build a word cloud and highlight the most common words, but I gave that up after I realized that words like "I" and "had" were the most frequent.

Regards,
Shyam Seshadri

Thanks, Shyam.
 
I'd very much like to hear if you've used something new or creative. Shall put it up on the blog here. Pls use the comments space below for Q&A.