Saturday, November 16, 2013

Session 8 HW queries

Hi all,

Some session 8 HW related queries that IMHO merit wider dissemination...

Satish Wrote:

Dear Professor,
I am having trouble interpreting the output of the factor regression and was wondering whether you could help me understand it better...

I understand that we use the factor regression for categorical variables. But in the Session 8 HW, the quant, qualitative etc are not categorical variables but we are forcing them to be categorical – correct? I didn’t understand why we were doing this.. (eg: summary(lm(overall ~ factor(quant1) +factor(quali1) +factor(R1) +factor(HWs1) +factor(blog1))))

Also, how does the interpretation of the results from the factor regression differ from that of regular regression? For example, what does each beta coefficient mean in a factor regression? I understand that ‘high’ is the reference in each of the factors but what exactly does it mean when we say that (for example) increasing the factor(quant) low would decrease the overall rating? (as shown by the negative sign)

Could you please elaborate? Thanks
Best Regards
Satish

My response:

Hi Satish (and Swati, who had a similar Q),

1. True that we use dummy variables (inR, factor() function makes dummy 0/1 variables out of a categorical variable) for categorical or nonmetric variables, and that the raw data for C02014 feedback was metric.

2. The point of the HW was to get you to run a dummy variables regression anyway. I *discretized* the metric X variables into categorical X variables using Hi/Med/Low scale. Normally, we wouldn't do this, metric variables are anyday much more informative than nonmetric factors. But for this HW, we did.

3. The interpretation for a factor regression is straightfwd - take the High/Med/Low case. By default R chooses 1 of the 3 categories (typically the first, High) as reference, sets it to zero and measures the effect of the other two factor levels (Med and Low) against this zero baseline. If Med and Low have higher impact than High, then they are positive. Lower impact, then they are negative and about the same impact as High, then they are insignificant.

4. Changing the reference makes no difference to the rest of the regression, it only moves the baseline up or down. For example, if you were to make Low the reference, then just add the negative of the Low coefficient to the coeffs of High, Medium and Low and you have your new set of coefficients.

To test this, just tweak the code slightly: replace 'factor(quali)' with 'factor(quali, ref = "Low")' in the code and then run the analysis again. Note what happens to the coeffs, to the overall fit in R square terms etc.

Hope that helps.

Anupama writes:

I have one more query –
I am unable to interpret negative coefficients in the variant of regression when you introduced categories of independent variables in the assignment.

If the overall qualitative rating is on a scale of 1-9….should I understand categories as - Low -> 1-3; Med -> 4-6; High -> 7-9 ?
With the above understanding, should I interpret negative coefficient for qual1-Low as ….
‘decrease in low qualitative rating increase overall rating’ => ‘Increase in quality rating increases overall rating’ ?

Overall implications => Professor should give high importance to qualitative material and low level of quantitative material and rest of the factors(like HW, blog) are not significant enough to affect overall rating?

Please let me know if my above understanding is correct.

Also, it would be great if we can addendum to the Session 8 and provide solution to this assignment.
It would help us in preparation for end-term exam.

My response:

Hi Anupama,

This is correct:

‘decrease in low qualitative rating increase overall rating’ => ‘Increase in quality rating increases overall rating’ ?

The High/Med/Low were chosen I think based on this rule: High (low) if score is > one stdev above (below) from the mean. Rest all are medium.

Have a received a few more such queries, will write a blog post and share my responses.

P.S.
Will putup session 8 HW solution (actually choose a few exemplerily good submissions) on LMS.

2 comments:

  1. Professor,

    What would a significant beta zero (Y intercept) mean? Even if we have a zero rating for all factors, overall rating for MKTR will be non zero.

    ReplyDelete
    Replies
    1. Yes... The intercept says that even if all the Xs are zero, the mean Y (model-free, i.e. not dependent on the available Xs) has a certain positive level.

      Delete

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.