Friday, November 20, 2009

More mailbag.... Friday

I got this email:

Hello Prof .
I came across one of you blog where you mentioned that the no of X columns for regression should be as few as possible.

While creating the X variables. I have taken 11 brands as mentioned in the DataSet construction example. This will end up in 11 dummy variables already.
So essentially will it make sense to reduce the no. of brands to a few.

Also as per the secondary data set . a lot Brand/Model dont have a corresponding sales/PRice figure. would it make sense to remove those from the regression data.


Thanks
D

My reply:

Yes, D.
Take only the top 4-5 brands as dummy variables and club all the others into an 'others' variable. Be careful to not include the others as a dummy else the regression will fail due to multicollinearity.

Some brand-models are not available in some years. Simply drop such observations from the dataset.

Sudhir

No comments:

Post a Comment

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.