Saturday, November 7, 2009

Final analysis dataset - Download link available


The analysis dataset downlink link is here.

It’s a 7212 x 182 dimensional dataset – many cells are empty though.

I suspect about an hour’s worth of work will have to be done before the ‘raw dataset’ can be made usable even for descriptive analysis, let alone any advanced methods. I was originally planning to clean and collate the dataset but it’s a task that’s so essential (and mundane) in survey analysis that I figure people should spend that one hour doing so. Valuable learning.

My guesses are pivots, filters, VLOOKUPs, Find and Replace (CTRL+H), IF-THEN loops to create dummy variables in Excel etc etc will have to be invoked and deployed bigtime to clean and collate the data, recode some text responses to numeric values, and other such mundane stuff.

Any queries or clarifications are welcome.

Ciao and good luck.



  1. dear prof,
    Wasn't '1 hour to clean data' a bit unrealistic ? I say this because I spent over 5 hours for the same :-( primarily using filters, Find and Replace and If-Then statements.
    But, yes, it was quite an experience and i don't mind spending the night on the dataset!!!

  2. Nandan,

    Something tells me that you won't use the major part of the data you cleaned for the heart of the analysis that answers the research questions, IMHO.

    That is what 'backwards MKTR' was all about too, in some sense.

    Besides, if I directly came out and said, you'll have to hours upon hours upon hours just cleaning the data, how many would even have attempted to interrogate the dataset at all? Just wondering only...


Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.