Saturday, November 14, 2009

MKTR secondary dataset emailed

Class,

This is just to ensure the email didn't get missed. The attachment contains secondary data on monthly car sales. Yippie!

Here's my merry email message:
Class,

Pls find attached the secondary data on monthly car sales units by brand-model for the months Dec-08 to Jul-09 and from Dec-07 to Jul-08.

Admittedly wasn’t easy getting the dataset, was buried somewhere in the data annexures of the Crisil industry research pages. Tks to the group that brought this problem about the inaccessibility of data secondary to my attn this morning.

The secondary data are a goldmine of info (but you already knew that). Marketshares, YoY trends, growth rates, segment-wise and brand-wise breakups – so much info at your fingertips. Kindly use this as supporting evidence for statements you do endup making in the project.

Again, given time constraints – don’t go overboard. My advice: select some top 15 or 20 brand-models and go with those.

Price info (ex-showroom) is available at carazoo.com. For price info in 07-08, one could conceivably deflate current prices by the average CPI but am not sure that’s a good idea. I would suggest you take just the Dec-08 to Jul 09 sales and proceed.

So your demand function will be this sales units (Y) = f(Price, size, brand and maybe other Xs). Take your call on what the Xs should be, if any. Would be nice to have at least 60% of variance in Y explained. Consider using previous year’s sales as an X, maybe?

Integrating primary data here will be darned tough because the integration has to happen at the brand-model level, which the primary data aren’t formatted according to. You needn’t go there unless you have to.

As for f(.), take your call. Try some basic specifications – simple, quadratic (in size? Price?),interactions, log-log and so on. Compare simple model fits and pick the best one. Try stepwise if you want to.

Chalo, I know it’s a tad late in the day, but, have fun with the data folks!

Sudhir

Update:
OK, Go easy on including lagged sales (i.e. last year's sales figure for the same month) as an X. Reason is it will suck up most of the explanatory power and render the other Xs insignificant. And its not really an independent variable, is it?

No comments:

Post a Comment

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.