Date: Thu, 3 Jun 2010 13:35:03 -0700
From: "Pirritano, Matthew"
Subject: Re: interaction in a linear regression model

If X1 and X2 are categorical then you need recode them in order to enter them into a linear regression. Dummy coding or effect coding. Otherwise you're treating the categories in X1 and X2 as if they were continuous intervals on a scale, which probably doesn't make sense for categorical variables. Then to look at interactions you'd look at interactions between each dummy/ effect coded variable and each other dummy/ effect coded variable.

My favorite reference for interaction effects in regression is Jaccard & Turrisi (2003). It's a little green Sage University Paper. Very thorough.

Good luck.

matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

-----Original Message-----
From: Myung Ki
Sent: Thursday, June 03, 2010 12:31 PM
Subject: interaction in a linear regression model

Hello, everybody.

I have queries about interaction. Here is model;

Y (Y1-Y4) = b0 + b1X1 + b2X2 + b3X1*X2 + e

In one model, both X1 (4 levels) and X2 (5 levels) are categorical, when Y is continuous. Proc glm gives me lots of lines from all combinations of levels. For illustration purpose I thought it might be better to have one estimate than displaying estimates from all combinations of levels, and I put X1 and X2 as continuous variable. I am not sure whether this is a right approach.

In another model, Y and X1 is continous and X2 is categorical(5 levels). When I put this model, without saying to SAS X2 is categorical, then all p-value for each Y (Y1-Y4) were significant (P-value was based on Type III SS). However, if I model X2 as categorical, then all but one Y were not significant. When I looked at the data and plotted them, the latter looks to be more sensible. But, to be consistent with previous model in presentation, I prefer to have one (overall) estimates.

So the question is;
1) whether introducing a categorical data as a continuous variable to create interaction term is correct and if there is difference what would be correct,
2) In case that categorical variable(s) consist of interaction term, P value from type III SS can be used for overall assessment of interaction term,
3) If (2) is case, then what would be better way to display so many estimates and if there is any alternaitve way,

Any suggestion and guidance to relevant references will be appreciated.

Thanks in advance.

Myung ki, PhD
University College London

