Saturday, November 7, 2009

More mailbag...

Hope I got to janta in time before too much time was wasted in cleaning every column and datum out there in the response sheet.

Hi Prof,

Just a heads-up on the data cleaning – will definitely be more than a couple hours worth of work. I’ve happened to do this before when working and its already taken me over 45 mins. to clean 3 columns because the free text boxes have differences in capitalization, spellings, typo errors, combined answers.... and I’m using my discretion based on the type of analysis I would want to run.

Thought it would be useful for you to know in terms of time required before we can start to run any analysis on this data.

Best –
R

My response:

Class,

Pls refer to the email below.

Cleaning and recoding can take a long time. Especially when the responses are keyed in (non-standardized) and not selected off a menu. I reckon there’s only a few (3? 4?) such questions of direct interest and impact for the analysis based on the research objectives.

Here’s what I would do if I were you:

Given time constraints etc, I would restrict myself to the top few (say 6) brands and the top few (say 6) models only. I would simply bin all the others into some ‘other’ category. Rationale? Well, the incremental value contributed by the marginal brands and car models to the overall analysis is, well, marginal. Time is better spent chasing variables and insights that have a ‘first order effect’ on the recommendations rather than a 2nd or 3rd order-effect.

Similarly, if someone wants to see spatial patterns in data, use STATE rather than TOWN/CITY. The incremental value offered by the latter, IMHO, is marginal.

Basic analyses – descriptives, cross-tabs, simple hypothesis tests don’t require too much cleaning and recoding, IMHO. And these yield significant insights (1st order effects!) on what subsequent analysis to consider and what to rule out.

Pls restrict the scope of your analysis and the variables of interests ruthlessly – else nothing will get done!

And always, always, map everything you do back to the 3 research questions originally laid out. If some analysis does not directly tie in with those 3 questions, don’t bother doing it. It does feel like a climbdown that the end result of analysis might not be as jazzy, sophisticated, advanced, cool or ‘fundu’ as one initially envisions, perhaps. That is valuable learning too.

The 3 research questions for the project are explained here:

http://marketing-yogi.blogspot.com/2009/11/project-scope-and-objectives-revisited.html

A final summary datasheet has been put up here:

http://marketing-yogi.blogspot.com/2009/11/final-dataset-summary-sheet.html

In general, pls check up the blog more often over this last week in the runup to the proj deadline and the exam.

Tks and good luck.

Hope that helps.

Sudhir

No comments:

Post a Comment

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.