Date: Tue, 10 Aug 2004 15:36:32 -0700
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Regression with multiple categorical variables
In-Reply-To: <20040810215300.79663.qmail@web61208.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
--- anne olean <annekolean@YAHOO.COM> wrote:
> >Regression coefficients are not holy things. They are just
> estimates.
> >Why are you so concerned about the regression coefficients?
> >What is the QUESTION that you are trying to answer? Focus on that,
> and
> >pay little attention to the regression coefficients, which are
> arbitrary.
>
> The question I am trying to answer is how the groups differ from each
> other over time (note: i also have a continuous time variable in the
> model). If I pick one group as the reference group then all the
> coefficients are wrt to that group. so, say A is the reference group,
> then the other coefficients are B vs A, C vs A, D vs A, etc. What if
> you want to know the coefficient for B vs. C ? would you have to set
> C (or B) as the reference group, and then run the model again? or can
> that be obtained from the model in which A is the reference group?
>
First of all, I would recommend that you report least squares means
rather than regression coefficients. The least squares means are
the same regardless of which group is selected as the reference
group. Regression parameters are offsets from the reference group
mean with the reference group having an offset of zero from its own
mean. Least squares means just add the reference group mean (under
the fitted model) to every group.
Now, you state that the problem you are trying to address is how the
various group means differ over time. In order for the structure of
the group means to differ over time, there has to be a group by time
interaction. Do you have such a term in your model? You only state
that you have A|B|C (=A B C A*B A*C B*C A*B*C) in your model (with
each of A, B, and C being binary variables so that you effectively
have 8 groups). Are all of the high order effects really required?
Remember that in order to examine difference in mean structure over
time, you will need to include a model with TIME|A|B|C. That means
that you must consider a model with four three-way interactions and
one four-way interaction. I would certainly be loath to interpret
such a model. You had better have extremely convincing evidence
that all of those high order interactions are necessary.
> Secondly, when I plot the predicted values from the model against the
> predictors to show change over time, the predicted values are based
> on the model with a specific reference groups. So, the plot will
> differ depending on which reference group is used. I could
> potentially have 8 such graphs (each with estimates based on a
> different reference group). How do I decide which of the plots to use
> for interpretation?
If you want to show changes over time, then predicted values for each
group should be plotted againt time, with different symbols and/or
line types and/or colors used to represent the different groups.
Again, my comments above about using least squares means should
address your concern about reference group.
>
>
> >You may also choose a full-means model, in which you label each
> group,
> >and omit the interaction.
>
> I'm not quite sure I understand...could you elaborate?
>
Construct a variable (NewVar) which takes on 8 levels as follows:
NewVar A B C
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
Then perform your ANOVA employing the categorical variable NewVar.
The variable NewVar contains all of the information in the three
main effects, the three 2-way interactions, and the 3-way interaction.
If you remove the intercept term from the model and you specify
the variable NewVar before categorical variable Time, then the
parameter estimates for NewVar at your first time value will
actually be the least squares mean estimates at that first time.
Parameter estimates at other times will be offsets from the first
time least squares means. If you do this, then you cannot analyze
low order effects (A, B, C, A*B, A*C, B*C) very easily. You would
have to construct tests employing the CONTRAST statement to examine
low order effects. It is just as easy (and should be more
enlightening) to use the original variables in your model.
Dale
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail