```Date: Tue, 10 Aug 2004 15:36:32 -0700 Reply-To: Dale McLerran Sender: "SAS(r) Discussion" From: Dale McLerran Subject: Re: Regression with multiple categorical variables Comments: To: anne olean In-Reply-To: <20040810215300.79663.qmail@web61208.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii --- anne olean wrote: > >Regression coefficients are not holy things. They are just > estimates. > >Why are you so concerned about the regression coefficients? > >What is the QUESTION that you are trying to answer? Focus on that, > and > >pay little attention to the regression coefficients, which are > arbitrary. > > The question I am trying to answer is how the groups differ from each > other over time (note: i also have a continuous time variable in the > model). If I pick one group as the reference group then all the > coefficients are wrt to that group. so, say A is the reference group, > then the other coefficients are B vs A, C vs A, D vs A, etc. What if > you want to know the coefficient for B vs. C ? would you have to set > C (or B) as the reference group, and then run the model again? or can > that be obtained from the model in which A is the reference group? > First of all, I would recommend that you report least squares means rather than regression coefficients. The least squares means are the same regardless of which group is selected as the reference group. Regression parameters are offsets from the reference group mean with the reference group having an offset of zero from its own mean. Least squares means just add the reference group mean (under the fitted model) to every group. Now, you state that the problem you are trying to address is how the various group means differ over time. In order for the structure of the group means to differ over time, there has to be a group by time interaction. Do you have such a term in your model? You only state that you have A|B|C (=A B C A*B A*C B*C A*B*C) in your model (with each of A, B, and C being binary variables so that you effectively have 8 groups). Are all of the high order effects really required? Remember that in order to examine difference in mean structure over time, you will need to include a model with TIME|A|B|C. That means that you must consider a model with four three-way interactions and one four-way interaction. I would certainly be loath to interpret such a model. You had better have extremely convincing evidence that all of those high order interactions are necessary. > Secondly, when I plot the predicted values from the model against the > predictors to show change over time, the predicted values are based > on the model with a specific reference groups. So, the plot will > differ depending on which reference group is used. I could > potentially have 8 such graphs (each with estimates based on a > different reference group). How do I decide which of the plots to use > for interpretation? If you want to show changes over time, then predicted values for each group should be plotted againt time, with different symbols and/or line types and/or colors used to represent the different groups. Again, my comments above about using least squares means should address your concern about reference group. > > > >You may also choose a full-means model, in which you label each > group, > >and omit the interaction. > > I'm not quite sure I understand...could you elaborate? > Construct a variable (NewVar) which takes on 8 levels as follows: NewVar A B C 1 0 0 0 2 0 0 1 3 0 1 0 4 0 1 1 5 1 0 0 6 1 0 1 7 1 1 0 8 1 1 1 Then perform your ANOVA employing the categorical variable NewVar. The variable NewVar contains all of the information in the three main effects, the three 2-way interactions, and the 3-way interaction. If you remove the intercept term from the model and you specify the variable NewVar before categorical variable Time, then the parameter estimates for NewVar at your first time value will actually be the least squares mean estimates at that first time. Parameter estimates at other times will be offsets from the first time least squares means. If you do this, then you cannot analyze low order effects (A, B, C, A*B, A*C, B*C) very easily. You would have to construct tests employing the CONTRAST statement to examine low order effects. It is just as easy (and should be more enlightening) to use the original variables in your model. Dale ===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@fhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- __________________________________ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ```

Back to: Top of message | Previous page | Main SAS-L page