LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 4 May 2010 12:43:40 -0400
Reply-To:     peterflomconsulting@mindspring.com
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject:      Re: interaction & main effect question
Comments: To: Robert Feyerharm <robertf@HEALTH.OK.GOV>
In-Reply-To:  <201005041620.o44GDd7b030259@malibu.cc.uga.edu>
Content-Type: text/plain; charset="UTF-8"

Robert Feyerharm wrote <<< I have a question concerning the inclusion of main effects & interaction terms in a model. The consensus seems to be that main effects should be included in a model if the interaction term is statistically significant. >>>

Rather than this, I'd say main effects should be included if the interaction is included. There are some people who point out some rare exceptions (there was a paper by David Rindskopf on this, perhaps a decade ago) but this is the usual.

<<< My understanding (from reading Hosmer & Lemeshow's text on logistic regression) is that first a main effects model should be tested, and then interaction effects can be tested from any main effects that were found to be significant. As opposed to adding main effect variables post-hoc after an interaction term is found to be significant.

Is this a valid way to approach to model construction that will consistently identify all possible interactions? >>>>

This is my understanding of Hosmer and Lemeshow as well.

But it is not a process I can endorse. It relies much too heavily on the notion of statistical significance; in my view, this should play virtually no role in model building. Further, it is entirely possible to have important interactions when there are no main effects. It is also possible to have important effects that are not significant. Further, it is possible to have a small effect be important. For instance, if the literature shows that a certain effect is large, and you show it is small, then including that may be very interesting to the progress of science.

<<< It makes sense from a practical viewpoint. Testing an initial model with main effects *and* all possible interactions thrown in would seem to risk over specification. For example, with only 10 main effect variables in a proposed model, there are (10 2)= 10!/2!8! = 45 possible interaction terms which could be added to the initial model in addition to the main effects. That's way too many terms IMO.

Nevertheless, I can certainly visualize, from a geometric standpoint, a regression model which includes an interaction term but *no* main effect terms (that is, y = beta3*x1*x2). See the third graph from the top on the following page from UCLA's Academic Technology Services: >>>

David Cox said "There are no routine statistical questions, only questionable statistical routines".

Certainly looking at ALL the two way interactions (to say nothing of three-way and higher interactions) leads to a very complex model, and one in which there are almost certainly too many terms.

But we must let research guide statistics, and not the other way around. What are the questions of interest? What interactions make sense? Which might be important if they were found?

It is true that some statistical analysis is exploratory. But I would maintain that NO analysis is completely exploratory. Why were these data and not others collected? Except for the very worst type of data mining, we collect and look at data for SOME reason. We do not throw the statistical abstract of the United States into a blender and press FRAPPE. (I am not quite sure what FRAPPE means, but I've seen it on blenders).

Robert Abelson titled his book "Statistics as principled argument". That is what statistics should be - part of a principled argument about what the data mean.

Peter


Back to: Top of message | Previous page | Main SAS-L page