Saturday, November 17, 2012

How to interpret mlogit Output

Hi all,

Have received several emails asking for more detail on the mlogit output interpretation. I shall use the old mlogit example we did in session 8 classwork. To recap, the following 2 images give the context to the problem:

The image above shows that the dependent (or, Y) variable is discrete and can take on of 3 values - EDLP, Deep-infrequent and Frequent-shallow. For convenience, let us denote them as 1,2,3. These 1,2,3 are nominal and have no other interpretations. The second image below shows what we have set out to do.
So, what we have set out to do includes standard regression stuff such as estimating the direction (positive or negative) as well as some idea of the relative magnitude of each variable's impact on the probability the store belongs to y=1,2,3. It also includes weighty stuff like calculating probabilities predicting for new storeprofiles.

After running the analysis, this was the results table:

Things to note here:
  • There are 3 levels of Y (y=1,2,3) and what we are modeling is the probability that a given profile's y takes values 1,2 or 3 i.e. Pr(y=1,2 or 3)
  • The leftmost numbers on the row names of the Coefficients table indicate which of y=1,2,3 that coefficient belongs to. Thus '2:sales' refers to y=2 and 3:sales refers to y=3. Now, '2:sales' has estimate -2.696 means that to calculate the probability that y=2 {denoted by 'Pr(y=2)'}, we use -2.696 as the coefficient for sales.
  • Likewise, because '3:sales' has the value -5.358, we would use -5.358 as the coefficient for sales to calculate Pr(y=3). The same generalizes for coupon1 and clientrating as well.
  • One level is the 'reference' level, the baseline relative to which the coefficients for the other 2 levels of y are measured. In our example, the reference level is y=1. Thus, all the coefficients for Pr(y=1) are set to zero.
  • This means that 1:(intercept), 1:coupon1, 1:sales and 1:clientrating all have estimate '0'.
  • Now, since y=2 has a negative coefficient for sales, it means that higher the sales, lesser the Probability that y=2 compared to y=1. In other words, as sales goes up, Pr(y=1) goes up and Pr(y=2) comes down.
  • Similarly, since y=3 has an even more negative coefficient than y=2, it means that higher the sales, lower the Probability that y=3 compared to y=2 or y=1. Thus, sales sales rise, Pr(y=1) > Pr(y=2) > Pr(y=3)
  • Putting a similar interpretation to the variable 'coupon1', I'd say, if coupon1=TRUE (i.e. coupon variable has value '1' instead of '2') then Pr(y=1) < Pr(y=2) < Pr(y=3). And so on
  • The intercept doesn't have any interpretation per se. It is used as is and its variable always has value 1. The estimate for 1:(intercept) will have value 0, as usual.
  • Clientrating is not significant. This means, clientrating does not seem to systematically vary with Y. So, while for fitted values, we use the Estimates shown, we do not infer anything from clientrating in this case.
Well, I hope that helps. Folks, I'm available for meeting 2pm today onwards and all of tomrrow. Just call my extn #7106 and drop by.

Sudhir

2 comments:

  1. Professor,

    The output table does not include Coupon2 and its coefficients and significance level. Since coupons are of two types, shouldn't there be dummy variables in the regression equation for both types?
    Please advise

    Regards
    Neetika

    ReplyDelete
    Replies
    1. Hi Neetika,

      coupon2 is the reference level and is set to 0. Generally, if there are n levels in a categorical variable (e.g., here, coupon has 2 levels - 1 and 2), only (n-1) dummy variables can be used. The nth is always used as reference and set to 0. In mlogit, we set the first one to zero and use as reference.

      Hope that clarifies.

      Sudhir

      Delete

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.