Monday, November 1, 2010

Errata from Lecture 7

Hi Class,

Lecture 7 deals with Causal research (Experimentation) and the analysis of variance. The two are together because they are connected - the former collects data which the latter is very well suited to analyze for inference, recommendations and action.

The hands-on work regarding running ANOVA models on different software - primarily JMP, Excel and R - in class today led to two errors for which reason I write this erratum online.

1. JMP by default seems to read all data in as numeric only.  In ANOVA, our independent variables, the Xs, are nominal/ categorical. So, JMP's reading them wrong resulted in the results being misleading as well. Even the ANOVA results JMP displays are nothing but the joint significance F-test which is computed on a linear model considering the Xs as metric.

The way out is to double-click on each column header and redefine the variable type to categorical/nominal. Then the thing runs fine and the results seem plausible.

2. Excel is not well-suited to doing ANOVA related work primarily because the data input it requires seems to need lots of prior editing.The data must be sorted and placed in arrays before Excel will read and analyze them. JMP is much better placed in this regard. R has its own issues. It does not give the joint significance F-stat and p-value.

So, bottom line - we stick with JMP as the best of the lot - after inputting data in the appropriate manner.

OK, more generally, there are two kinds of statistical tests that we do to conclude things one way or the other - tests of association that typically involve interdependent data, and those of dependence that involve dependent data. The stats primer started with basic model building - waded into over-identified systems and from there into models of dependence. The chi-square tests we saw last class were tests of association and today, we see a powerful battery of tests for dependence.

This has applications on the ground - not just in the project but for business problems, situations and issues you are likely to face at work. Take a re-look at your ELPs to see if the methods picked up could have come in handy. The reason I'm hard-selling data work and quant methods and what we have been doing in the past 2 odd lectures is that I get the vibe I haven't convincingly conveyed their importance, practical utility and breadth of applicability in real-life problems to a significant section of the class. 

Well, class, the going gets tougher, of sorts, here on. We're entering a rather dry part of the course involving analysis and some data work. My experience tells me that data work imparts real skills that are sale-able in the marketplace - whether explicitly or implicitly. It lends realism, credibility and perspective to the problem formulation and research design skills we learned in the early part of the course. So even if the lecture appears rather dry, dull and all, hang on. It will all connect together, not least through the project, to tangible knowledge learned and skills earned in the course of the course.

My apologies to sections A and B for the JMP issues that came up. My thanks to the folks who spoke up and pointed out the errors.

Sudhir

3 comments:

  1. Dear Sir,
    I found out some good videos showing ANOVA operation in excel. Sharing with you hope you find it useful.

    http://www.youtube.com/watch?v=F66weCUsRc0&feature=related

    Regards,
    Saurabh

    ReplyDelete
  2. Thanks Saurabh,

    I did go over a few ANOVA related vids myself.

    I would still say that ANOVA on Excel is more pain than gain. JMP is a better bet. IMHO. Feel free to disagree and go with what works for you, of course.

    Sudhir

    ReplyDelete
  3. Well, update:

    Have figured out JMP much better now. Hopefully, the lec 7 in sections C and D will proceed smoother than it has so far.

    Sudhir

    ReplyDelete

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.