Monday, December 12, 2011

Letter Grades Sent

Hi Class of 2012,

This is my last communication to you as students of MKTR from this blog.


I just sent in the letter grades to ASA a few minutes back. The grade distribution remains roughly what it was last year - around 10% earned an A, 49% secured A-, about 31.7% obtained a B and the rest a B-.

Was glad to see some familiar names do well and was a little disappointed another set of familiar names do less well than I'd expected. By familiar names I mean people I have come to interact with and know in the course of the course.

So, again, all the best for the journey ahead. Do keep in touch going fwd and should you actually put into practice what you picked up in MKTR, I would love to hear about it. Should you as an alum tomorrow want to talk as a guest speaker to the MKTR class etc, that would be just great too.

Ciao and cheerios,

Sudhir

Saturday, December 10, 2011

Closing all re-eval requests and inquiries.

Received a re-eval request for phase 3 and this was my response. Am posting here because some of it is general in nature and bears wider dissemination.

Hi D,


I reviewed your Team's submission. Before I go into some detail, let me clarify a couple of things.

1. In each of the 10 grading criteria, a "1" means the submission was "at par" or met expectations. For any given grading characteristic, the majority of groups would have scored a 1. So getting a "1" does not mean you did badly. A "1.5" means the submission was 'above par' on that criterion and a 0.5 means it was 'below par' on that criterion. A "2" (very rare and subjective) means the submission was exceptional on that criterion. Hence, you will see that 1.5 was the maximum score on most criteria because some team or the other, on a relative basis would have done well on some criterion or the other.

2. This year, almost all teams did well. The mean was higher and the variance lower than in past years. I suspect it has partly to do with the example PPTs being put out. People were able to learn from past mistakes better and submit better work than average. But that also meant that the par value went up for the "at par" and "above par" ratings.

Your team did well on most parameters. You were at par on many criteria. The 0.5 you got for ROs was because the ROs were unclear. The DPs seemed uni-dimensional (I didn't cut marks here though) and somewhat narrow given the project scope. However the 1.5 in insights obtained is because, despite being narrow, the ROs seemed to have been fully addressed.

Teams that put in some form of animation so that the sequence of steps comes by itself got a 1.5 on creativity. Groups that were able to integrate their analysis into a clean, reasonable set of recommendations did well in 'results obtained' and 'insights' criteria. And so on.

So overall, I'm sorry to say I do not see scope to change the marks as given. Its not because you did badly but because most other teams did well also and the bar was raised for scoring at 'above par' levels.

I hope that clarifies.

I shall close re-val requests for Phase 3 and its time to release the grades soon.

Sudhir


Wednesday, December 7, 2011

Wrapping up.

Hi all,

Did a lucky draw for 5 people to receive 2000/- in sodexo coupons. The following are the names:

1.Debdutt Patro from Bengaluru for Team Sultanpur

2. Manonita Das from Vizag

3. Rahul Modi from Chennai for DumDum group

4. Mrs Usha Chandrashekaran from Chennai for Team Naihati

5. Varun Verma from Punjab for Team Jhunjuni

The grades will soon be out. Re phase III grading, I received this from K:
Dear Prof. Sudhir,


Could you please let us know if we could have a feedback session on our project report for MKTR?

Since this subject is extremely important going forward for our careers, it is important for us to understand how to improve upon our work.
Thanks and regards,
K
My response:
Sure K,
BTW, which group was yours and how much did you score out of 20?


P.S.
I deliberately didn't look at student names in the first slide so as to avoid any possibility of bias.
Overall, I think the quality of project submissions was ceretainly higher this year than in the past 2. The variance is lower and the mean higher. I believe it has something to do with the example projects helping people avoid re-inventing the wheel and go with what works.

I was also pleasently surprised to see the surfeit of secondary data used in various creative ways to support the storyline. In the car project, I had provided the class with a secondary dataset on car sales by brand. Here, people went out ontheir own and got the data. Nice.

Some raised the objection that the same IP address was used multiple times. I;'d raised this with IT who said that in LAN connections, the webserver's IP may get used as common proxy. So its not necessary that the same person is re-taking the surveys.

Some teams wrote in 'learnings' that questionnaire wasn't clear on project goals and could have been designed better. Sure. A concomitant learning is that no questionnaire for a large and diffuse project will be perfect. There'll always be things that could have been done better. Often, clients themselves wouldn;t have very clear-cut business problems to give the MKTR teams. All this is part of life, of how things work in the real world. You go with the data you have and see how best what you want can be squeezed out of it.

Thanks to a few teams for the some of the creative sher-shayaris I saw in some PPTs. Added zest and liveliness to the whole thing.

That's it folks. Shall putup grade distribution in my next and last post for this year.

Sudhir

Friday, December 2, 2011

Some Q&As over exam paper viewings

Hi Professor,

I remember at the beginning of the course you mentioned 10 points for attendance. But I see that it’s counted as 7 points. Could you please look into it?
Thanks,
M
My response:

Hi M,

The attendance and CP together constitute 15% (refer course pack and session 1 slides).

Each session attended fully carries a 1% point credit. However, the first session is not counted (it is prior to final registration). And of the remaining 9 sessions, upto 2 sessions were given 'off' - i.e. no attendance penalty for not showing up in upto 2 lectures. Hence, effectively, the attendance component drops to 7%.

I hope that clarifies
-----------------------------------------------
Dear Prof Sudhir,

My market research experience had been really enriching especially the real life project and data analysis.

I had a separate question on one of the end-term problems. If in a multiple regression, one variable is found to be totally insignificant based on t-value and p-value, can we really take its contribution as more if it has high Beta than a variable who may have a lower beta but is a significant one.

My understanding was we can’t determine the impact solely on Beta in case the variable in not insignificant and only if a variable is significant based on t & p-values, then I can rank them on their impact based on the beta values.

I may be wrong but I wanted to know the right answer and I had given a re-eval request on this aspect too.
Thanks in advance for clarification.
H

My response:

Hi H,

I did go through quite a few re-eval apps which seem to be based on your query. There does seem to be confusion on this score.

1. If a variable isn't significant, then based on either its standardized beta and/or its significance level, it will not show up as impactful. If the standardized beta is still large, it can't be insignificant. The 2 don't go together.

2. If a variable has been used in a regression model and later found insignificant, that still doesn't mean the variable can be dropped when computing predicted values. Many folks appear to have made this mistake. This is because, the coeffs of all other variables are obtained given the presence of this variable in the regression. The alternative could be to drop the variable, re-run the regression and then use the new coeffs (betas) from the new model (which does not use the dropped variable) for prediction.

I hope that clarified. I'm glad to hear the project was found to be relevant and useful. As always, I'd appreciate candid and constructive feedback on how to improve the course for next year and beyond.

Regards,
-------------------------------------------------------------------------------
Dear Professor Voleti,

As posted on your blog: there was a typo in the binary logit question. The coeff of income^2 was shown as -.088 instead of -0.88. Hence, the predicted probabilities of channel watched for all 3 cases would now come up as 1.

I along with many other students spent a lot of time trying to solve this question but gave up after trying parts 1 and 2. The probabilities came as 1 for part 2 and by just looking at parts 3-4 we concluded that we would get the same answer and hence we left them blank.

I feel its unfair that we have been awarded 0 marks for parts 3-4 when people have been given full marks for restating the formula stated in parts 1-2.

Can this evaluation scheme be looked into again, taking into account that the data for the question was wrong and we have understood the concept since we have correctly answered parts 1 & 2?
S
My response:

Hi S,

After the typo, the answer to Q2 - (c) (ii)-(iv) comes up as Pr(channel)=1 in all three cases. Because of the typo, we realized people may get different answers etc., so we decided to award marks to all who attempted the Q.

If Pr=1 in all cases, then this should have been written down in all three cases.

If the space is left blank, then what judgment is a grader to make? Even stating that in words that the answer is coming Pr=1 in all three cases would have given us grounds to award partial if not full credit there.

But graders can't do anything when students leaving a Q blank with no justification why it was not attempted.

So I'm sorry, I cannot at this time accede to the request of changing the grading template to include non-attempts as well simply because I cannot justify doing so. The AAs have been quite liberal with awarding credit in the concerned Q due to the typo.

Hope that clarifies.
-------------------------------------------------------------
Prof Sudhir,

I just checked the answer keys of the end term paper and it appears to me that there are a couple of issues there.

In a log log regression a fixed percentage increase in the dependant variable leads to a B*the percentage increase of the dependant variable. The question asked whether 1 ounce increase in size on average will lead to a 2% increase in sale. The Size is increase is absolute and hence the statement is false and the answer key appears to be wrong to me.

Also if both adjusted R2 and R2 are both given then we should we not consider the adjusted value as a goodness of fit indicator as it takes into account the degrees of freedom and the sample size thereby explaining the variance more accurately. Additionally the question itself had a misprint 82.%% and leads to ambiguity.

I have put the paper for revaluation based on the above. Would be great if you can help me clear my doubts/errors in interpretation.

Regards
R
My response:

Hi R,
I agree with your first point. The 'sales' in the LHS are unit sales and not sales vol in ounces. The correct answer should be FALSE in Q1 - (c) -(ii). Chandana and Pankaj, We should meet to discuss this and any other corrections that would arise.

Re your second point, well Q1-(c)-(iv) asks not about assessment of model fit but only about % of variance explained by Xs in the Y. The latter is appropriately defined by the multiple R-sq while the former is better described by the adj-R-sq.

Re the typo in this part (82.%%), well the correct answer is 83.1% and hence I would have mistyped it as 83.%% instead of 82.%% if I meant the multiple R-sq. So I cannot accede to changing this part of the answer key.

Hope that clarifies.

Sudhir
-----------------------------------------------------------------

More as they come.

Sudhir




Saturday, November 26, 2011

Eleventh hour Phase III Q&A

Update:
Got this email:
Hi Professor,


Will it be possible to extend the deadline for submission of group project from 6 PM to midnight i.e. 12 AM.

There are couple of reasons behind this request :

1) We followed the instructions for extension of SPSS recommended by IT services. However it involved Virtual Box creation which is a terribly slow way to operate heavy data on SPSS (each excel file is 3-4 MB, and it takes 10 minutes just to transfer data from excel to SPSS run on Virtual Box)

2) A lot of time was spent looking for secondary data which we did not expect.

3) Conflicting deadlines for Pricing, CSOT, ENDM and BVFS between today and tomorrow.

Will be grateful if the extension is allowed.
Regards,
My response:
Hi G and Team,


I have no problem with extending the deadline another 6 hrs.

I don't know how to program turnitin.com.

Chandana has set the deadline for 6 PM on the dropbox.

Let me try to reach her and see if she can extend this to 12 midnight.

Shall inform the class soon regarding this.

Sudhir
Hope that clarifies. Shall let you folks know soon via (yet another) mass e-mail..

Sudhir
------------------------------------------------
Hi all,

Got this series of mails from "t" just now. Have edited to remove specifics and focus on general take-aways to share with the class.

Hi Prof,
One question on cluster analysis-
While choosing variables for cluster analysis, can we pick 3 factor scores which are essentially 3 buckets of multiple variables and two individual variables (ie scores on ‘I am generally budget conscious’ and ‘ I carefully plan my finances’)? Does that distort the output? Can it be interpreted?
Regards,
T
My response:
Hi T,
No, that's perfectly fine. In fact, it is recommended when certain variables don;t load very well onto the factor solution in the upstream stage.
As for interpretation, yes it follows the same as it would if they'd been factors. Factor scores are now essentially variable values where we are concerned in downstream analysis.
And then T replied:
Thanks for that prof. The only problem is that the variables under question are becoming disproportionately important in the 'predictor importance' scores and are pulling down the importance of other factor scores.
At which I wrote:
Make sure all the input variables into a cluster analysis program are standardized (i.e. subtract mean and divide by std dev) before cluster analysis to remove variable scaling effects on clustering. That should help.
Sudhir

P.S.
Will keep you posted as more Q&A happens.

Phase III - Grading Criteria

Hi All,

Might as well outline some thoughts on the grading criteria for the project. These are indicative only and are not exhaustive. However, they give a fairly good idea of what you can expect. Pls ensure your deliverable doesn't lack substance in these broad areas.

1. Quality of the D.P.(s) - How well it aligns with and addresses the business problem vaguely outlined in the project scope document; How well can it be resolved given the data at hand. Etc.

2. Quality of the R.O.s - How well defined and specific the R.O.s are in general; How well the R.O.s cover and address the D.P.s; How well they map onto specific analysis tools; How well they lead to specific recomemndations made to the client in the end. Etc.

3. Quality and rigor of Data cleaning - The thinking that went into the data cleaning exercise; the logic behind the way you went about it; the ways adopted to minimize throwing out useful observations using imputations, for instance; the final size of the clean dataset that you ended up with for indepth analysis. The data section should contain these details, ideally.

4. Clarity, focus and purpose in the Methodology -  Flows from the D.P. and the R.O.s. Why you chose this particular series of analysis steps in your methodology and not some alternative. The methodlogy section would be a subset of a full fledged research design, essentially. The emphasis should be on simplicity, brevity and logical flow.

5. Quality of Assumptions made - Assumptions should be reasonable and clearly stated in different steps. Was there opportunity for any validation of assumptions downstream, any reality checks done to see if things are fine?

6. Quality of results obtained - the actual analysis performed and the results obtained. What problems were encountered and how did you circumvent them. How useful are the results? If they're not very useful, how did you transform them post-analysis into something more relevant and useable.

7. Quality of insight obtained, recommendations made - How all that you did so far is finally integrated into a coherent whole to yield data-backed recommendations that are clear, actionable, specific to the problem at hand and likely to significantly impact the decisions downstream. How well the original D.P. is now 'resolved'.

8. Quality of learnings noted - Post-facto, what generic learnings and take-aways from the project emerged. More specifically, "what would you do differently in questionnaire design, in data collection and in data analysis to get a better outcome?".

9. Completeness of submission - Was sufficient info provided to track back what you actually did, if required - preferably in the main slides, else in the appendices? For instances, were Q no.s provided for the inputs to a factor analysis or cluster analysis exercise?  Were links to appendix tables present in the main slides? Etc.

10. Creativity, story and flow - Was the submission reader-friendly? Does a 'story' come through in an interconnection between one slide and the next? Were important points highlighted, cluttered slides animated in sequence, callouts and other tools used to emphasize important points in particular slides and so on.

OK. Thats quite a lot already, I guess.

Sudhir

Thursday, November 24, 2011

Phase III - D.P. definition travails

Update:
Hi all,

Another Q I got asked recently which, I think, bears wider dissemination.

1. The 40 slide limit is the upper bound. Feel free to have a lower #slides in your deliverable. No problem.

2. The amount of work Phase III may take, by my estimate would be about 15-20 person hours in all. Say about 3 hours per group member. Anything much more than that and perhaps you are going about it the wrong way. The project is the high point of applied learning in MKTR_125. So yes, 3-4 hours of effort on the project is not an unfair amount of load. Besides, effort correlates well with learning, in my experience.

3. Do allow yourself to have fun with the project - its not meant to be some sooper-serious burden, oh no. Keeping a light disposition, a witty touch, a sense of optimism and the big-picture in mind helps with flexibility, creativity, out-of-box and all those nice things, in my experience. Whats more, if you enjoyed doing the project, rest assured it *will* show in the output.

Hopefully that allays some concerns about workload expectations etc. relating to Phase III.

Sudhir

------------------------------------------------------------------------------
Folks,

The example PPTs putup don't have a D.P. explicitly written down because I had given the D.P. in project 2009. In Project 2011 however, you have been given the flexibility to come up with a D.P. of your own, consistent with the project scope. This is both a challenge and an opportunity.

I got some Qs on whether the handset section responses can be ignored because client is primarily a service provider. Well, the handset section contains important information about emerging trends in the consumer's mindspace, the mobile application space and the mobile-related brands' perception space. A telecom service provider would dearly like to know how many target segment people have/are switching to smartphones, which apps they use most often so that the carrier can emphasize those apps more and so on. So yes, the handset section should not be ignored, IMHO.

Hope that clarified.

Sudhir

--------------------------------------------------------------
Got an email outlining whether such-and-such was a good D.P. - R.O. combo.

Now, I can;t share what the D.P. itself was but my response to that team had generalities which might do with some dissemination within the class.

Hi R and team,


Looks good. However, I don't quite see what the decision problem is. Step 1 seems to state more an R.O. than a D.P.

Sure, sometimes a D.P. maps exactly onto a single R.O. and that may well be the case here. But such a D.P. would perhaps be overly narrow considering the project scope initially outlined.

I'd rather you state a broader D.P. and break it down into 3 R.O.s each corresponding to the S, T and P parts of S-T-P.

Well, that's my opinion, you don't necessarily have to buy into it. What you do is ultimately your call.

Do email with queries as they arise.

Good luck with the project and happy learning.
Sudhir
More miscell Q&A:
Dear Sir,


While running the factor analysis on the pychographic questions, do we re-label the levels of agreement and disagreement as 1-5? Will this help us in any way?

Another way out could be to label the responses as 1 and 0 where 1 is for a level of agreement and 0 for disagreement. In the Ice-Cream Survey HW the responses to these psychographic questions were binary which actually helped in the analysis.

A quick response will be really helpful. Our approach is to first segment the consumers using factor analysis and then find some interesting insights into usage based on these segments.

Cheers,

S
My response:


Hi S,
1-5 (or 1-7 in case of a 7 point scale) is the conventional practice. Safe to go with conventional practice.

You might as well choose to go with something else but may need to justify why.

Alternately, copy the data columns of interest onto a new sheet, and reduce the #responses to three (-1 for disagree, 0 for neither and +1 for agree), re-run factor analysis and see if the variance explained, factor structure etc that you now get is better than with 1-5.

Idea is that as and when you approach specific problems, you think up of neat, creative ways to negotiate it and move on. Therein lies the learning in the analysis portion for MKTR. :)

Hope that helped. Do write in with more queries should they arise.
----------------------------------------------------------------------------
Dear Sir,

We are facing an issue with regard to clubbing the dataset with Q48 (in terms of Rank) into the final data set that was uploaded earlier. The following are the discrepancies:

• The new Data set for Q48 does not have serial no’s to link it with earlier set

• The no of rows (entries) in the new data set for Q48 are more than the original one

• There should be ranks from 1 to 3 but we see that some data entries up to rank 9

It would be beneficial for all the groups to have a final data set with Q48(in terms of ranks) extracted from Qualtrics.

Looking forward for a quick response from your end.

Regards,

M

My response:

Hi M,

The respondent ID that is there (leftmost column in the Q48 dataset) can be used to match (via VLOOKUP func in excel) with the same rows in the original dataset. The blogpost made a mention of this specifically.

Once you match the Q48 dataset entries to the original dataset ones, the problem regarding #responses less or more in Q48 dataset also goes away.

The ranks 1 to 3 alone are relevant. Some people ranked more than the top 3 items, so you may get upto 9 but that we can ignore, if need be.

Hope that clarifies.
Sudhir
----------------------------------------------------------------
Received this email today fro group "J":
Dear Sir,

We have the following analysis for our MKTR project:

Decision problem: Foreign handset player wants to enter the Indian market
Research Object: Wants to identify the most lucrative segment to enter.

This would make some of the questions about service provision irrelevant. Do you think we are on the right track?
Regards,
My response:
Hi Team J,


The project scope was designed from the viewpoint of a large Indian telecom service provider.
So a purely handset-maker's perspective would be limiting, I feel and perhaps not entirely consistent with the original project scope document.

My suggestion is you consider modifying the D.P. such that the data on service provider characteristics can also be integrated and used in some manner. For instance, a handset maker looking to ally with a service provider, perhaps.

In general, the D.P. should not be so limiting as to banish a good part of the data we have collected from analysis. There is plenty of scope to creatively come up with D.P.s that while focussing on the handset side of the story also use telecom carrier data.

Hope that clarifies.
Sudhir

Monday, November 21, 2011

Project Phase III Q&A

Update:
A more recent email with similar Qs:

Hi Professor,

Hope you’re well!

I’m writing to ask some help with the data cleaning of the Project’s Phase III data. Can you please help with the following queries?

1. What does it mean when the data field has both blanks as well as “-99” as the response? Thought it was the same thing, yet Q45, for instance, has both fields.

2. Can you please also advise on how to handle coding of ranking/rating questions? Specifically:
a. Q41 – Rating of top two categories of channels watched
b. Q48 – Ranking of only Top 3 brands out of a list of about 10

3. Also, how do we code blank responses (especially for interval or ratio scaled questions)?
a. Do we put a “0” or leave it blank?
b. How does SPSS handle zeroes vs. blanks?

Please advise.
My response:

Hi A and Team,
1. "-99" means respondent has seen the Q but chosen to ignore it. Blank means the respondent never saw the Q, i.e. the skip logic didn't lead him/her to the Q in the first place.


2. Q41 - select multiple options is what was done for the tv channel Q. So, folks have selected 2 (and some have selected more than 2). There is no ranking implicit in what was chosen.

Q48 - I'll download this Q afresh so that the rankings are visible. At present, under 'download as labels', we are unable to see the rankings.

3. a. Depends on how many blanks are there. If the entire row is blank, drop that row. The Q was not relevant to the respondent and hence qualtrics skipped those Qs for him/her. If a few columns here and there are at "-99", then either impite mean/median for that column or "0" for "do not know". Its a call you have to take given the context the Q arises in.

b. Doesn't handle them very well but allows you to do some basic ops. SPSS will ask whether you want to exclude cases (i.e. rows) with missing observations or whether you would rather replace the missing cells with the column means. Choose wisely and proceed.

Am sending the Q48 ranking data afresh to the AAs for an LMS upload. Should be up for viewing and download soon. Use the respondent ID to vlookup and match like rows in the master dataset you are currently working on.

Hope that clarified things somewhat.
Sudhir
-------------------------------------------------------------------------------------------------
Got this email from a team:

Dear Chandana,
Phase 3 project requires a 40 slide PPT as our deliverable. I would like to get the following things clarified:
a)      Is 40 the minimum or maximum limit. It seems too big a task.
b)      Can we generate output tables of tools such as Cluster and Factor analysis and include it as per the content of the PPT or is it that the tables be part of the appendix alone.
c)       Is secondary data analysis mandatory/allowed/is optional. Since the entire survey cannot cover a limit of 40 slides through a primary survey alone.
Kindly help in getting these points clarified as we are in the process of finalizing our approach.

My response:

Hi Team G,
1. The 40 slide limit is the upper limit. Feel free to have your final PPT deliverable less than 40 slides long.
2. There is little point in pasting SPSS output tables for factor.cluster analyses inside the 40 slide limit, IMHO. I'd rather groups present the interpretation/info/insight that emerges from such techniques. A hyperlink to the appendices that contain the SPSS tables shouldn't hurt at all, though.
3. Secondary data usage is welcome as long as the sources are documented and cited meticuluously.
IMHO, the main challenges arise in deciding upon a suitable D.P. and its constituent R.O.s. What follows is straightforward once these are set. I'd say, don't be overly ambitious in defining your D.P. nor overly shallow in scope either.
I hope that clarified things at least somewhat. Pls feel free to write in with queries as and when.

Regards,

Sudhir



Sunday, November 20, 2011

Data cleaning tips for Project Phase III

Hi all,

As you by now well know, one of the exam Questions related to output from your phase III project (the factor analysis one). While analyzing data for this particular problem, I did notice quite a few irregulartities with the data. This is typical. And hence, the need for a reasonable cleaning of the data prior to analysis.

For instance in the Qs related measuring to importance of telecom service provider attributes, there were a few pranksters who rated everything "Never heard of it." Clearly, its stretch to think internet savvy folks have never heard of even one of a telecom carrier's service attributes. Such rows should be cleaned (i.e. either removed from analysis or if there are gaps etc, then imputing missing values for these gaps etc) before analysis can proceed.

Some advice for speeding up the cleaning process:

1. Do *not* attempt to clean data in all the columns. That is way too much. Only for important Qs, on which you will do quantitaive analysis using advanced tools (factor/cluster analysis, MDS, regressions etc) should you consider cleaning the data. So choose carefully the columns to clean data in. Needless to say this will depend on what decision problems you have identified for resolution.

2.  An easy way to start the cleaning process is to check whether there is reasonable variation in responses for each respondent. Thus, for instance, after the psychographic Q columns, insert a new column and in it, for each row, compute the standard deviation of the responses of the psychographic Qs. The respondents with very high or low standard deviations should be investigated for possible data cleaning. Thus for example, this method would catch people who mark the same response for every Q. Or those who mark only extremes for some reason.

3. If there are gaps or missing values (these are depicted with -99 in the dataset) in critical columns, then you may consider replacing these with the median or the mean of that column. A fancier, more general name for such work is imputation which includes also various model based predictions to use as the replacement value. Without imputation, one will be forced to drop the entire row for want of a few missing values here and there. Always ensure, the imputed values are 'reasonable' before going down this path.

More apropriately, imputation might work better if used segment-wise. Suppose you've segmented the consumer base based on some clustering variables. Now since those segments are assumed homogenous in some sense, one can better impute missing values with segment means/medians perhaps.

4. Don't spend too much time on data cleaning also. In the first (half-)hour odd spent on data cleaning, chances are you will cover the majority of troubled rows. After that there is a diminishing returns pattern. So draw a line somewhere, stop and start analysis from that point on.

Update:
I'll share my answers here on the blog to any Qs asked to me by groups. Conversely, I request groups to first check the blog for whether their Qs have already been answered here insome form or the other.

Hope that clarifies.

Sudhir

P.S.
Any feedback on the exam, project or any other course feature etc is welcome. As always.


Friday, November 18, 2011

Homework and Exam related Q&A

Post-Exam Update:
I later noticed, after the exam had started that there was a typo in the binary logit question. The coeff of income^2 was shown as -.088 instead of -0.88. Hence, the predicted probabilities of channel watched for all 3 cases (min, max and mean profile of respondents) would now come up as 1. Folks who show the expression and calculations will get full credit for this problem.

A related Q regarding the DTH caselet. There are many possible assumptions you could make and from each, a different research design might flow. As long as you've stated clearly your assumptions and the research design that follows is logically consistent with your assumptions, you are OK.

Hope that clarifies.

Sudhir

Update:

Another thing I might as well clarify: - regarding the quant portion - only interpretation of a model and its associated tables will be asked for.

Even there, only those tables that we explicitly have discussed in class, not merely shown but discussed, will be important from an exam viewpoint.

Sudhir

Am getting quite a few Qs regarding this. A wrote in just now:
Hi Chandana,

Need a quick clarification. For this home work questions 1,2,3 and 4 are mandatory and 5,6,7 are optional. Is that right ?
Regards,
A
My response:
Sudhir
----------------------------------------------------------------
Got this just now:
Professor,

When I mailed you today morning, I had high aspirations of finishing my studies and then meeting you for a quick review. Unfortunately, I just finished going through Session 4 hand out. My sincere apologies but I guess I will have to cancel this appointment…and give the slot to a better prepared student…!!

Thanks,
K
My response:
That's OK. Drop by anyway. I expected I'd be busy during these office hrs but am (pleasantly) surprised. Seems folks have on average understood the material well and don't need additional office hrs now.

Besides, anyway, the exam is not designed to be troublesome. I wouldn't worry overmuch if I were you.

Sudhir
----------------------------------------------------------------


Hi all,

Was asked yesterday about this and might as well share with the whole class.

Q was that in the factor analysis homework, what to do if the factor solution with eigenvals>1 is still showing cumulative variance explained of <60%?

Well, if the % is in the mid to late 50s, just go with it.

If not, it would seem like the factor solution is not doing a great job of explaining variance in the input variables. This is presumably because at least some variables are not well correlated with the others and are hence weakening the factor solution.

To ID such variables, look either at the correlation tables or at the communlaities table. The variables that show least correlation with others should progressively be removed and factor anbalysis re-conducted at each step till the 60% criterion is met.

If a variable loads entirely on its own factor, drop that factor and use that variable as-is in downstream analysis.

Hope that clarifies. Any more such Qs and I shall share it here.

Sudhir
Hi A,
Yes, only Q1-4 are mandatory. Rest are optional in session 9 HW.

Thursday, November 17, 2011

Project related Q&A

Update:

I split up the old post which had both Q&A as well as deliverable format. The old post is now exclusively deliverable format and is available here:

http://marketing-yogi.blogspot.com/2011/11/more-phase-iii-project-q.html

A recent post on take-aways from past projects (particularly those in 2010) can be found here:

http://marketing-yogi.blogspot.com/2011/11/take-aways-from-past-projects.html
------------------------------------------------------------------------------

Hi all,

Got asked a few Qs, might as well putup answers here.

1."The decision problem is vague. Should we focus more on telecom services or handset features?"

My response: The scope document is vague for good reason. Provides enough wriggle room to variously interpret decision problems. Pls come up with your own decision problem that is not-inconsistent with the project scope.

Regarding telecom service versus handset, well, the scope doc says the client XYZ is a major player in the telecom services space - so it is primarily a service provider. The handset features thing is likely secondary to its main goal of improving its position among telecom service providers.

Pls come up with a suitable D.P. and appropriate downstream analysis. The main theme of interest will be, how much does your D.P. align with the scope document and how logical and consitent is the donwstream analysis with the stated D.P.

2. Scope says, XYZ wants to know "where the market is going". What does this mean?

Well, XYZ will want to know many things, sure, but only so many are knowable with the data at hand. My intention behind that part was, illuminate the current standing of different offerings first and then project a few years into the future. Make assumptions as necessary. There'll be little data to back up projections into the future, understandably so. Shall later expand on wht I mean by this point.

More Q&A as they come will be on this page as updates.

Sudhir
-----------------------------------------------------------

Update: More project related guidance

In general, there are a few Questions which the client is likely to find of interest and which I think each project team could consider going through.

(i) What is the turnover (i.e. rate of change or switching) among attractive customer segments in telecom carriers and handsets? [Hint: See how long one has been with a carrier/handset Q among others]. Is there a trend? Anything systematic that might indicate the market is moving towards or appears to be favoring a certain set of attributes more than others?

(ii) What kind of usage patterns, apps etc are the attractive segments (say in terms of purchasing power or WTP etc) moving towards? More voice? More data? More something? From here can flow recommendations to the client on which applications to focus on, which platforms to explore alliances with etc perhaps.

(iii) Perceptual maps of current reality - where attractive customer segments perceive current carriers and/or handsets to be. What are the critical dimensions or axes along which such percpetions have formed? What are the attractive gaps in positioning that may emerge? Which niche perhaps can a new or existing service lay claim to on the positioning map?

More Qs will be added as they occur.

Pls note that it is *not* necessary that these Qs be answered or recommendations made along their lines. Its just a guide for teams to think about incorporating into their current plan/roadmap. Incorporating these Qs makes sense only if they align with the D.P.s and the R.O.s you have chosen.

Sudhir 

Wednesday, November 16, 2011

Student Contributions - Week 5

Well, the Nielsen story did invite some strong-ish reactions from folks here and there. Good, good.

Here's what Anshul has to say about his experience:
Hi Prof. Voleti,


As discussed in class today, I wanted to mention some shortcomings of Nielsen data that I noticed in my line of work.

I worked in media planning and buying, and used the TAM data quite extensively in order to make decisions regarding the best TV channels to use, as well as in the post analysis of campaigns.
However, quite often, Nielsen data would throw up seemingly garbled figures, due to the following reasons:

1. Insufficient sample size in some markets for the selected TG. This was fine for some weeks, but not others indicating that the sample size could vary from week to week for the same TG and market.

2. Missing data on some ads resulted from inaccurate reporting (missing figures, for instance). In some cases, this was due to inaccurate coding of the creatives.

A possible reason for the errors is that there is still a lot of manual work involved between the time data is collected to when it is shared with agencies.

Therefore, we had to always be careful while using Nielsen TAM data, and tried to cross-check it wherever possible. That was, however, not always possible and we just had to assume that the TAM data was accurate.

This was my observation and thought I’d share. Hope this helps!
Kind regards,
Anshul Joshi
Thanks, Anshul.

Any further student contributions will be put on up here.

Sudhir

Phase III Project - Deliverables and format

Hi Class,


Your deliverable consists of a 40 slide PPT (excluding appendices). Should be emailed to me with a copy to the AAs before deadline: 27-Nov 6 pm.

More specifically, your PPT should contain:


1. Filename (mandatory) - should be of the form group_name.pptx when submitted.

2. Title slide (mandatory) - Project title - something concise but informative that describes the gist of what you've done. In addition, the title slide should also contain your Group Name, Team-member names, PGIDs  MKTR section.
Pls note, make no mention of your team member names anywhere else in the PPT, have it only in the title slide.

3. Presentation Outline (Optional) - a Contents page that outlines your presentation structured along sections (e.g. methodology, [...] , recommendations, etc).

4. Decision Problem(s) (Mandatory) - State as clearly as feasible the decision problems (D.P.s) you are analyzing Give numbers to the D.P.s if there're more than one.

If small enough in number, you may also list the R.O.s you have in this slide.


5. A Methodology Section (Mandatory) - In preferably a graph or flowchart form, lay down what analysis procedures you used, in what order to answer particular Research Objectives (R.O.s) that cover the D.P.s.

6.A Data section (Mandatory) - explains that nature and structure of the data that were used. Be very brief but very informative - write (ii) the dimensions of the data matrices used as input to in different procedures, and (ii) the sources of data - cited sources if secondary data are used, and Question numbers in the survey questionnaire if primary data are used.

I strongly suggest using a tabular format here. Packs a lot of info into compact space. Easy to read and compare too.

Some clue as to what filters or conditions were used to clean the data would be very valuable also.


7. Model Expressions (Mandatory)- Write the conceptual and/or mathematical expressions of any dependence models used. Then directly use the results in downstream analysis.

Kindly place in the appendix section a descriptives table of the input data, a brief explanation of the X variables used, and of course, output tables along with interpretation.

8. Appendix Section (Optional): Some of the less important tables can be plugged into a separate appendix section (outside the 40 slide limit) in case you are running out of slide space. Have only the most important results tables in the main portion.

9. Recommendations (Mandatory) - crisp, clear, in simple words directed towards the client. Emphasize the usability and actionability of the recommendations.

Further, I strongly advise groups to make their PPT deliverables reader-friendly. That is:
- use as simple language as feasible.
- Animate the slides if they are cluttered so that when I go through them later, the proper sequence of material will show up.
- Highlight keywords and important phrases.
- Use callouts etc as required to call attention to particularly important points.
- An economy of words (sentence fragments for eaxmple) is always welcome.

I hope that clarifies a lot of issues with deliverable format. Groups that flout these norms may lose a few points here and there.

Sudhir

Monday, November 14, 2011

Exam related Q&A

Update: Pls read the Coop case in the course-pack. Bring the coursepack with you to the exam hall.

Hi All,

Have just now emailed the end-term exam to ASA and asked the AAs to put up the practice exam on LMS. Admittedly, preparing an open-book exam is not fun. Still, I did the best I could.

1. Ideally, questions would be close-ended and easy to sort out. However, some questions are perforce open-ended and involve text asnwers. To avoid confusion and overly long-winded answers, the exam is limited-space only. Meaning, there'll be some space provided within which your answer should fit. Anything written outside the given space will *not* be counted for evaluation.

2. The end-term has 5 questions, is designed for <2 hours but you will have 2.5 hours to do it in. So, time will not be a problem.

3. I reckon, as long as you're structured in your thinking, economical in your use of words, well-planned in your approach to the answer and reasonable in the assumptions you make and state, you should be OK.

Update: Let me re-emphasize that given constrined resources (answer space, in this case, and the use of pens rather than pencils) proper planning and organization will go a long way in helping students. I'd strongly advise making a brief outline of the answer first on provided rough paper before actually using the answer space.

Please feel free to use technical terms as appropriate, highlight keywords etc if that helps graders better understand your emphasis when making a point.

4. The practice exam is about half as long as the end-term but the question types etc are broadly the same. No solution set will be made for the practice exam, it is just for practice only.

Do bring a calculator to the exam.

Any other Qs etc you have on the exam, I'll be happy to take here and add to this thread.

Sudhir

Sunday, November 13, 2011

Take-aways from past projects

Update: Received a few such emails so might as well clarify:

Hi Prof,

There seems to be some confusion on the final report submission due date. This is to check whether submission is due in session 10.

Kindly confirm.
S

My response:

No. Its due on 27-Nov. - the day before term 6.
Hope that clarifies.
Sudhir
----------------------------------------------------------
Hi all,

As promised, some collected thoughts on where folks ran into obstacles in past years. Am collating and putting up relevant excerpts from past blogposts to start with. Shall add and expand this thread as more info comes in.
------------------------------------------------------------------
From last year (financial planning and savings avenues analysis)


[Some] stuff that IMHO mertits wider dissemination.
1. Let me rush to clarify that no great detail is expected in the supply side Q.
As in, you're not expected to say - "XYZ should offer an FD [fixed deposit] with a two year minimum lock-in offering 8.75% p.a.".
No.
Saying "XYZ should offer FDs in its product portfolio." is sufficient.


2. Make the assumption that the sample adequately represents the target population - of young, urban, upwardly mobile professionals. 

3. Yes, data cleaning is a long, messy process. But it is worthwhile since once it's done, the rest of the analyses follow through very easily indeed, in seconds. 

4. It helps to start with some idea or set of conjectures about a set of product classes and a set of potential target segments in mind, perhaps. One can then use statistical analyses to either confirm or disprove particular hypotheses about them.

5. There is no 'right or wrong' approach to the problem. There is however a logical, coherent and data-driven approach to making actionable recommendations versus one that is not. I'll be looking for logical errors, coherency issues, unsustainable assumptions and the like in your journey to the recommendations you make in phase III.
------------------------------------------------------------------
From last year on stumbles in the analysis phase:
1. Have some basic roadmap in mind before you start: This is important else you risk getting lost in the data and all the analyses that are now possible. There are literally millions of ways in which a dataset that size can be sliced and diced. Groups that had no broad, big-picture idea of where they want to go with the analysis inevitably run into problems.

Now don't get me wrong, this is not to pre-judge or straitjacket your perspective or anything - the initial plan you have in mind doesn't restrict your options. It can and should be changed and improvised as the analysis proceeds.

Update: OK. Some may ask - can we get a more specific example? Here is what I had in mind when I was thinking broad, basic plan from an example I outlined in the comments section to a post below:
E.g. - First we clean data out for missing values in Qs 7,10,27 etc -> then do factor analysis on psychogr and demogr -> then did cluster analysis on the factors -> then we estimate segment sizes thus obtained -> then we look up supply side options -> arrive at recommendations.

Hope that clarifies.
2. Segmentation is the key: The Project essentially, at its core, boils down to an STP or Segmentation-Targeting-Positioning exercise. And it is the Segmentation part which is crucial to getting the TP parts right. What inputs to have for the segmentation part, what clustering bases to use, how many clusters to get out via k-means, how best to characterize those clusters and how to decide which among them is best/most attractive are, IMHO, the real tricky questions in the project.

3. Kindly ask around for software tool gyan: A good number of folk I have met seemed to have basic confusion regarding factor and cluster analyses and how to run these on the software tool. This after I thought I'd done a good job going step-by-step over the procedure in class and interpreting the results. Kindly ask around for clarifications etc on the JMP implementation of these procedures. The textbook contains good overviews about the conceptual aspects of these methods.

I'm hopeful that at least a few folk in each group have a handle on these critical procedures - factor and cluster. I shall, for completeness sake, again go through them quickly tomorrow in class.

4. The 80-20 rule applies very much so in data cleaning:Chances are under 20% of the columns in the dataset will yield over 80% of its usable information content. So don't waste time cleaning data (i.e. removing missing values, nonsense answers etc) from all the columns, just the important ones only. Again, you need to have some basic plan in mind before you can ID the important columns.

Also, not all data cleaning need mean dropping rows. In some instances, missing values can perhaps be safely imputed using column means or medians or the mode (depending on data type). 

Chalo, enough for now. More as updates occur.
Sudhir
------------------------------------------------------------------
More from last year on specific problems encountered in final presentations:


1. Research Objective (R.O.) matters.
Recall from lectures 1, 2 & 3 my repeated exhortations that "A clear cut R.O. that starts with an action verb defined over a crisp actionable object sets the agenda for all that follows". Well that wasn't all blah-blah blah. Its effects are measurable, as I came to see.

Suppose the entire group was on board with and agreed upon a single, well-defined R.O., then planning, delegation and recombining different modules into a whole would have been much simplified. Coherence matters much in a project this complex and with coordination issues of the kind you must've faced. It was likely to visibly impact the quality of the outcome, and IMHO, it did.

2. Two broad approaches emerged - Asset First and Customer First.
One, where you define your research objective (R.O.) as "Identify the most attractive asset class." and the other, "Identify the most attractive customer segment." The two R.O.s lead to 2 very different downstream paths.

Most groups preferred the first (asset first) route. Here, the game was to ID the most attractive asset classes using size, monetary value as addressable market or some such criterion and then filter in only those respondents who showed some interest in the selected asset classes. Then characterize the indirect respondent segments obtained and build recommendations on that basis.

I was trying to nudge people towards the second, "Customer segmentation first" route partly because it aligns much more closely with the core Marketing STP (Segmentation-Targeting-Positioning) way. In this approach, the entire respondent base is first segmented along psychographic- behavioral - motivational or demographic bases, then different segments are evaluated for attractiveness based on some criterion - monetary value, count or share etc, and then the most attractive segments are profiled/analyzed for asset class preferences and investments.

Am happy to say that in a majority of the groups, once a group implicitly chose a particular R.O., the approach that followed was logically consistent with the R.O. 

3. Some novel, surprising things.
Just reeling off a few quick ones that do come to my mind.

One, how do you select the "most attractive" segment or asset class given a set of options? Some groups went with a simple countcriterion - count the # of respondents corresponding to that cluster and pick the largest one. Some groups went further and used a value criterion - multiply the count with (%savings times average income times % asset class allocation) to arrive at a rupee figure. This latter approach is more rigorous and objective, IMHO. There were only 2 groups that went even further in their choice of a attractiveness criterion - the customer lifetime value (CLV) criterion. They multiplied the rupee value per annum per respondent with a (cleaned up) "years to retirement" variable to obtain the revenue stream value of a respondent over his/her pre-retirement lifetime. Post-retirement, people become net consumers and not net savers, so post-retirement is a clean break from pre-retirement. I thought this last approach was simply brilliant. Wow. And even within the two groups that did use this idea, one went further and normalized cluster lifetime earnings by cluster size giving a crisp comparison benchmark.

Two, how to select the basis variables for a good clustering solution? Regardless of which approach you took, a good segmenting solution in which clusters are clear, distinct, sizeable and actionable would be required. One clear thing that emerged across multiple groups was that using only the Q27 psychographics and the Demographics wasn't yielding a good clustering solution. The very first few runs (<1 minute each on JMP and I'm told several minutes on MEXL) should have signaled that things were off with this approach. Adding more variables would have been key. Typically, groups adding savings motivation variables, Q7 constant sum etc were able to see a better clustering solution. There is seldom any ideal clustering solution and that's a valuable learning when dealing with real data (unlike the made-up data of classroom examples).

One group that stood out in the second point approach used all 113 variables in the dataset in a factor analysis -> got some 43 odd factors -> labeled and IDed them -> then selectively chose 40 from among the 43 as a segmenting basis and obtained a neat clustering solution. The reason this approach stands out in my mind 'brute force approach' is that there's no place for subjective judgment, no chance that some correlations among disparate variables will have been overlooked etc. It's also risky as such attempts are fraught with multi-collinearity and inference issues. Anyway, it seemed to have worked.
[...]
Anyway, like I have repeatedly mentioned - effort often correlates positively with learning. So I'm hoping your effort on this project did translate into enduring learning regarding research design, data work, modeling,  project planning, delegation and coordination among other things.
------------------------------------------------------------------
OK, that's it for now. Shall update the thread with more learnings from past years for reference purposes.


Sudhir

Thursday, November 10, 2011

Project Updates

Update: Received a few such emails so might as well clarify:
Hi Prof,


There seems to be some confusion on the final report submission due date. This is to check whether submission is due in session 10.

Kindly confirm.
S
My response:
No. Its due on 27-Nov. - the day before term 6. 
Hope that clarifies.
Sudhir

Update:
The Phase III dataset is available for download from LMS. Happy analyzing.

'-99' is the code for a question seen but not answered. The proportion of -99s for different Qs might give some clue as to the cost of not forcing those Qs to be answered.

Some exemplery projects have also been putup on LMS from MKTR 2009 car survey. Teams Ajmer, Mohali and Kargil did well whereas Jhumri-Taliya didn't do so great.

Just got done with the general tutorial/Q&A in AC2. My thanks to those who showed up. Got some valuable feedback as well. Some possible to implement right away for session 9 & 10, perhaps.

I'll make detailed slides on how-to on SPSS so that getting too much into the tool does not occupy class time. You can go home and practice the classroom examples at leisure. I'll focus more on analysis results and their interpretation. Unlike many other courses you may have taken, MKTR is a tool-heavy course, so some level of engagement with the SPSS tool is unavoidable.

Hope that clarifies.

Sudhir
-------------------------------------------------------------------
Hi All,

Am closing the survey responses for MKTR Project 2011 - we have some 3,153 responses in all. After getting rid of say about 20-25% invalid responses, we may still have well above 2000 responses. Quite sufficient for most of our purposes, I reckon.

I'll putup a preliminarily cleaned version of the dataset in Excel file format for upload on LMS by noon 11-Nov Friday. Phase III can then begin. Pls also find some exemplary PPTs on the MKTR car project 2009 analyses on LMS by Friday noon.

Sudhir

SPSS Issues (not license related)

Update: Received this email from Mr Suraj Amonkar of section C on one possible reason why we saw what we did in the 2-step procedure:
Hello Professor,


I have attached the document which explains a bit the sample-size effect for two step clustering.

Since the method uses the first step to form "pre-clusters" and the second step to use "hierarchical clustering", I suspect having too small a number of samples will not give the method enough information to form good "pre-clusters". Especially if the number of variables are high, relative to the number of samples

“ SPSS has three different procedures that can be used to cluster data: hierarchical cluster analysis, k-means cluster, and two-step cluster. They are all described in this chapter. If you have a large data file (even 1,000 cases is large for clustering) or a mixture of continuous and categorical variables, you should use the SPSS two-step procedure. If you have a small data set and want to easily examine solutions with increasing numbers of clusters, you may want to use hierarchical clustering. If you know how many clusters you want and you have a moderately sized data set, you can use k-means clustering. “
Also, there are methods that automatically detect the ideal “k” for k-means. This in essence would be similar to the “two-step” approach followed by SPSS (which is based on the bottom-up hierarchical approach). Please find attached a paper describing an implementation of this method. I am not sure if SPSS has some implementation for this ; but R or Matlab might.

Thanks,
Suraj.
My response:
Nice, Suraj.


It confirms a lot of what I had in mind regarding two-step. However, the 2-step algo did work perfectly well in 2009 for this very same 20-row dataset. Perhaps it was, as someone was mentioning, because the random number generator seed in the software installed was the same for everybody back then.

Shall put this up on the blog (and have the files uploaded or something) later.

Thanks again.

Sudhir
Added later: Also, a look through the attached papers show that indeed calculating the optimal number of clusters in a k-means scenario is indeed a difficult problem. Some Algorithms have evolved on how to address it but I'd rather we not go there at this point in this course.

Sudhir.
Hi All,

SPSS as we know it has changed from what it was like in 2009 (when I last used it, successfully, in class) to what it is like now. In the interim, IBM took over SPSS and seems to have injected its own code and programs in select routines, one of which is cluster analysis - two-step solution. This change hasn't necessarily been for the better.

1. First off, a few days ago when making slides for this session, I first noticed that a contrived dataset, from a textbook example no less, that I used without problems in 2009, was giving an erraneous result when 2-step cluster analyzed. The results shown was 'only one cluster found optimal for the data' or some such thing. The 20-row dataset is designed to produce 2 (or at most 3) cleanly separating clusters. So something was off for sure.

2. In section A, I over-rode the 'automatically find optimal #clusters' option and manually chose 3. In doing so, I negated the most important advantage 2-step clustering gives us - an information criteria based objective determination of the optimal # clusters. Sure, when over-ridden, the 2-step solution still gives some info on the 'quality' of the clustering solution - based, I suspect, on some variant of the ratio of between-cluster to within-cluster variance that we typically use to assess clustering solution quality.

3. In section B, when I sought to repeat the same exercise, it turned out that some students were getting 2 clusters as optimal whereas I (and some other students) continued to get 1 as optimal. Now what is going on here? Either the algorithm itself is now so unrelaiable that it fails these basic consistency tests for datasets or maybe there's something quirky about this particular dataset due to which we see this discrepancy.

I'd like to know whether you get different optimal #clusters  when doing the homework with the same input.

4. Which brings me to why, despite the issues that've dogged SPSS including license related ones, primarily, I've insited upon and stuck to SPSS. Well, its far more user-friendly and intuitive to work with than Excel, R or JMP, for one. Of course, the second big reason is that SPSS is the industry standard and there's much greater resume-value to saying you're comfortable with conducting SPSS based analyses than saying 'JMP' which many in industry may never have heard of.

A third reason is that in a number of procedures - cluster analysis and MDS among them - SPSS allows us to objectively (that is based on information criteria or other such objective metrics) determine the optimal number of clusters, axes etc that would otherwise need to be subjectively done risking erros along the way. Also, in many other applications, including forecasting along the way, SPSS provides a whole host of options that are not available in other packages (R excepted, of course).

5. Some students reported their ENDM based SPSS licenses expired yesterday. Well, homework-submission for the quant part is anyway gone flexible, so I'll not stress too much on timely submission. However, undue delay is not called for, either. I'm hoping most such students are able to work around the issue with the virtual machine solution that IT has documented and sent you.

Well, I hope that clarifies what's been going on and whay we are where we are with the software side of the MKTR story.


Sudhir

P.S.
6. All LTs are booked almost all day on Friday. The only slot I got is 12.30-2.30 PM on Friday. The poll results do not suggest a strong demand for an R tutorial. So I'm announcing a general SPSS cum R hands on Q&A session for 11-nov Friday 12.30-2.30 PM in AC2 LT. Attendence is entirely optional. If nobody bothers to show up, woh bhi chalega, I'll simply pack up and head home.

Wednesday, November 9, 2011

Session 8 Homework Q&A

Update: Received this email over the weekend.
Dear sir

Actually, I think the issue is not with column V7 (V7 has both 1's and 0's), but rather with column V19, which is all "." (dots) when added to SPSS.
On eliminating V19 and adding V7 back in, the factor reduction is running smooth.
R

Update: Have just sent in my solution to session 7 homeworks to teh AAs. Should be up on LMS soon.

Hi all,

Well, well.. an early bird decided to take on Session 8 HW aaj hi....

Here's an email I got from V:
Dear Prof.,


When trying to do the “factor analysis” on the “hw1 ice-cream dataset” I am encountering the following issue –

The data type is “nominal” (0,1) and when I run a factor analysis on SPSS, it throws up the following error message –

Warnings

There are fewer than two cases, at least one of the variables has zero variance, there is only one variable in the analysis, or correlation coefficients could not be computed for all pairs of variables. No further statistics will be computed.
Could you please help with what I might be doing wrong.

Thanks!

Regards,
V
My response:
Aha.


Check up the descriptive stats. See if all the variables are indeed 'variables'. If I recall correctly, one of the questions was such that *everybody* answered the same way.

So the std deviation in that column was zero. Will need to eliminate that one first.

Sudhir
And then his response:
Thanks professor!


It seems to be working after I eliminated V7.

Regards,
V
All's well that ends well I guess.

Sudhir

9-Nov Phase II interim results

Hi all,

We've some 2600+ responses. Am hoping for 2000+ valid responses now. Great going!. Here's the latest status-check on which team's where.

Row Labels Count of V1


Agra 73

Bhilai 77

Bijapur 60

Cochin 59

Dehradun 67

Dhanbad 37

Dum dum 119

Durgapur 125

Gulmarg 80

Guntur 76

HasmatPet 50

Jaipur 83

Jamshedpur 81

Jaunpur 71

Jhujuni 254

Kakinada 67

Kesaria 107

Naihati 88

Palgat 112

Panaji 55

Peepli 51

Portblair 44

Rampur 117

Sangli 73

Shivneri 55

Sultanpur 82

Team Leh 47

Trawadi 61

Trichy 61

Ujjain 27

(blank) 294

Grand Total 2654

Teams with 8-10 respondents per team member, (or, say, 60+ valid responses per team) are at par. So don't worry overmuch about it.
 
Sudhir

Tuesday, November 8, 2011

Session 7 Homework issues

Got these few emails:

Dear Prof/TA,
I could see two different Homework Part II for Session 7. One in page 11 of the handout that was given to us.
The other is in the slides in Addendum slides for session 7 called Homework Part II ( Optional )
My question is, which one is actually Homework Part II for session 7 and is it optional ?  Sorry if I misunderstood instructions in the class.
Thanks
H

My response:

Yes, the slide deck contains the optional part. Feel free to ignore it.
There are two questions in session 7 Hw - one related to standard regressions (mobile usage example) and one related to multinomial logit in worksheet 'hw4 MNL'.
Hope that clarifies.


Then this one:

Professor,
I am unclear on Part-II of hw 7. We are asked on predicting the edulevel using the means and modes of the relevant model. So we have the “us” or the numerator part of the logit model, but not the “them” part.
To analyze the “them” part, we have 3 levels in rateplan, 3 levels in gender and familysize.
·         Is taking family size as the mean vale for the denominator appropriate?
·         If so, this will give a total of 9 combinations (or parts) for the denominator.
·         We then look at the probability of education level 1 and 2 and whichever is more probable is our answer
Is this approach correct?
Sincerely,
R

My response:

Hi R,
Pls look at the addendum for session 7 in which is put up on LMS on the logit based prediction model.
There, given a set of X values X={10,9,1} for {sales, clientrating, coupon1}, we predicted the probability of instorepromo being 1,2 or 3.
Similalrly, once you have a set of Xs for {edu, famsize, rateplan, etc.}, use the SAME X profile in both the numerator and the denominator of the logit expression.
Hope that clarifies.
Another:
Hi Chandana, Can you let me know where the worksheets for the homework are? I’m unable to find them on LMS. Session 7 slide 29 says  worksheet labeled ex2Session 7 slide 44 says worksheet name ‘hw 1 MNL’ I can’t find either. Thanks,RC

My response:
'ex2' is the standard regressions based homework - mobile usage one. 'hw4 MNL' is the logit based homework. It was wrongly written as 'hw1 MNL'. MNL is Multi Nomial Logit in the sheet name. I hope that clarifies. 
Shall putup more Q&A homework related as they happen in this thread. More recent on top.

Sudhir

SPSS travails regarding a course clash

Update:
ITCS has sent an email with instructions on how to use the virtual machine thing that Manojna below had helpfully pointed out to. This is for those 60 students whose licenses may run out this week. Pls follow the instructions and get your SPSS extended. IT has also kindly agreed to help out should you run into any difficulties during this process. I'm glad a way has been found around this issue.

Sudhir


Update: Received the following from Manojna Belle.
Dear professor,


I’m one of the students affected by the course clash. Thought of sharing the work around that I am using to get around the problem. Here’s what I did:

I installed virtual box (a virtual machine software by Sun/Oracle – it’s open source available at https://www.virtualbox.org/wiki/Downloads), installed windows on it and then installed SPSS there. Seems to be working fine. The whole process took me about 45 mins.

Good thing about this is that an “image” can be created from this virtual machine. Whoever needs to replicate this can just install the virtual machine and import this image. So I’m guessing if IT can do what I did and create an image, everyone can setup their system in about 30 mins by simply installing virtual machine and importing the “image”.

Note: The virtual machine I configured didn’t have online access for some reason(not even the intranet). Had to install cisco nac to resolve it. But doing that was a little tricky – the only way to transfer content onto the virtual machine without internet/intranet is to mount a folder from the “host” machine. Others won’t have to do any of it, if IT can provide an “image” that has online access enabled.

Thanks and Regards,

Manojna Belle

Have forwarded the email to IT to see if they can use this to deploy on a large scale. There are 60 people in common between ENDM and MKTR. So yes, there is a lot at stake here.

Sudhir
---------------------------------------------------------------------------
Hi all,

I received this email from Gaurav yesterday:
Hello Prof,


A lot of us are also taking Prof Arun Preira's course on Entrepreneurial Decision Making (ENDM). We installed SPSS trial version for one of the assignments in that course some days back (more than a week). Since this would be a 15 day trial it might not last the duration of the marketing research course. Request you to have a look at what we can do.

Thanks and Regards,

Gaurav.
I forwarded this to IT:
Hi Team IT,


Could you tell me if the 15 day license would work if a previous 15 day license had already been deployed for another course (see email appended below)?

Pls advise.

Regards,
Sudhir
And this is the reply I got:
Dear Professor,


Regret to inform you that it doesn't allow us to re-install the same for another 15days, because it registers all the values in system files.

Regards,
Satish

Basically, I'm at a loss at this juncture. SPSS is the mainstay for this course and I was under the impression MEXL is the mainstay for the other course. Now, we have ourselves a fix here. I'll take it up with ASA and again with IT to see if there is a fix possible.

I wonder how many such cases of overlap between the 2 courses are there actually. I've asked ASA to send me the list of those affected

Update:
Basically, after a conversation with the IT folks, I see that we have a few options here, none of them particularly good.

Option 1: Re-install the OS. That clears the registry and the new 15 day license will be accepted automatically. However, this typically tends to take an hour odd per machine and is likely to overwhelm both student and IT team time. Still, folks willing to spare an hour for this can opt for it.

Option 2: Partition the hard drive and install Linux or WinXP or something there. Within that, an SPSS version can be installed. Not sure how different this is from option 1 in time and trouble terms.

Option 3: Do without SPSS in class. Borrow a peer's machine for a while or use the LRC lab's comps to complete homeworks on. Not a great option but workable since there is no comp-based SPSS exam component this year.

If anybody has any other ideas, pls let me know. Shall be happy to explore them further. Team IT has agreed to again approach the IBM-SPSS vendor with a fresh request but is not optimistic about a positive response given how much we have already asked of them this year.

Sudhir

Monday, November 7, 2011

7-Nov Phase II interim status-check

Hi All,

Am extremely pleased to say we've close to 1700+ responses already. Assuming even a quarter of them are unusable, we'll still have 1000+ responses to play around with. Good! And still we have some 3 days to go. Great!

Here's the latest status-check for where groups are currently:

Agra 57


Bhilai 58

Bijapur 40

Cochin 35

Dehradun 51

Dhanbad 29

Dum dum 105

Durgapur 108

Gulmarg 30

Guntur 53

HasmatPet 27

Jaipur 44

Jamshedpur 48

Jaunpur 65

Jhujuni 160

Kakinada 18

Kesaria 81

Naihati 30

Palgat 81

Panaji 27

Peepli 27

Portblair 30

Rampur 61

Sangli 65

Shivneri 49

Sultanpur 67

Team Leh 29

Trawadi 13

Trichy 50

Ujjain 20

(blank) 204

Grand Total 1763


I'd say about 8-10 valid responses per team member would be par for this phase. Of course, more is merrier. At or around par would earn most groups 5% of the total 8% grade. The top quarter of groups get 7 or maybe 8% based on the actual numbers.

Soon, very soon, we'll be up and running on this one with Phase III.

Sudhir

Sunday, November 6, 2011

About Mind-reading

Hi All,

We'd discussed in session5 (qualitative research) some of the ethical issues and risks posed by mind-reading-ish technologies. Well, well, just this week, the Economist carried a major piece outlining similar thoughts.
Reading the brain: Mind-goggling

Regarding how far tech has progressed, it says:
Bin He and his colleagues at the University of Minnesota report that their volunteers can successfully fly a helicopter (admittedly a virtual one, on a computer screen) through a three-dimensional digital sky, merely by thinking about it. Signals from electrodes taped to the scalp of such pilots provide enough information for a computer to work out exactly what the pilot wants to do.


That is interesting and useful. Mind-reading of this sort will allow the disabled to lead more normal lives, and the able-bodied to extend their range of possibilities still further. But there is another kind of mind-reading, too: determining, by scanning the brain, what someone is actually thinking about.

Well, imagine the possibilities for psychological and qualitative research then, eh?

Shall append more articles found that are relevant to this post then.

Sudhir

Homework session 6 Issues

Update:
OK, IT tells me they've sent instructions already for SPSS trial version download. Great. Then this homework turns oput to be much easier than I had first imagined. Good. Some of the gyan on re-coding and transforming data for the T-test elaborated below would still hold, I guess.

Hi all,

1. Pls let me know if there are any queries etc you're facing w.r.t. session 6 homework. Shouldn;t take more than an hour odd, by my reckoning but if you've no clue on how to approach the questions, then it can seem quite daunting, I now realize.

2. I'll present my own solution to this homework in class in a few slides.

3. The most common -sensical approach, the way I see it for the first two questions is to take out the four concerned columns in a fresh worksheet (and have REspondent ID also to keep count), build pivots and run chisq.test() in R on the pivots obtained. In Excel, you'll need to also generate the expected distribution. This is done as (row total* column total)/(overall total) for each cell in the table. As a general rule, ignore blanks and non-standard responses in your cross-tabs.

4. For the t-test question, you'll need to re-code data into metric (interval) form. So use CTRL+H or 'Find and replace' function in Excel to transform the text responses obtained into a 1-5 scale (or a -2 to 2) scale or something. Sort the responses to weed out blanks and other such. Then run the simple TTEST() function in Excel.

5. The above is only 1 way of doing these things. It seemed to me to be a common-sensical approach and so I elaborated on it. You may reach the answers in a quicker, smarter way, perhaps. That is entirely fine too.

Hope that helps. Pls use the comments thread below for Q&A in case of any queries.

Sudhir