Date: Thu, 18 Sep 2003 09:14:16 +0200
Reply-To: "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Subject: DRAFT regression with dummy variables DRAFT
Content-Type: text/plain; charset="iso-8859-1"
A colleague of mine has a design in which there are (8) subjects (P) on which two dependent variables (Q and S) have been measured at (6) different levels of a (actually continuous) variable (R). The question was whether and how the variables Q and S are related, whether one (Q) is enough to predict the other one (S) for the values of R.
Regression analysis has been performed per subject (P) between Q and S (for all values of R). Within some subjects (P) the regression showed a significant slope, in others it didn't, while the slopes were varying between subjects (P). Correlation coefficients between Q and S within subjects were all in the range of around .9 .
Meanwhile a method was searched and found to pool all subjects data and yet to perform regression analysis, controlling for the subject factor (P). 7 dummy variables (D1..D7) were introduced (for the 8 subjects), which all were set to 0 for subject 1, the first one (D1) set to 1 and the rest to 0 for subject 2 and so on until the last one (D7) to 1 and the rest to 0 for subject 8. At most one of the Dummies is 1 at a time.
The regression equation was:
S = c + t.Q + k1.D1 + k2.D2 + k3.D3 + k4.D4 + k5.D5 + k6.D6 + k7.D7
c : the constant
t : the slope (the tangent)
kx: the estimates for the differences between subjects (P)
The overall slope (t) appeared significant (much more observations, pooled subjects). Furthermore the individual regression equations (lines) can be determined by substituting 0 for all but one D's (all D's for the first subject), where the kx values add to the constant c. The individual lines, determined this way (and differing of course from previously determined individual regression lines) run in parallel, because they have equal slopes (t).
The analysis just described has been performed using Minitab by explicitely creating the dummies and doing the regression with them. My question is, is this possible with SAS without explicitly creating dummies like that; is there a procedure, which only internally creates the dummies to use and produces a regression equation similar to the one above? If so, what PROC and additional statements? Example?
Now an additional question comes up: is there any significant difference between individual (parallel) regression lines? I do remember such a test to exist, but I forgot the rest of the details. Is there a (standard) test (between constants (where X=0) or rather intercepts (where Y=0)) in SAS, possibly as part of an overall regression procedure? I could also think of ANOVA (involving S or Q, with P and R), where some main effect Subject (P) would appear (next to a main effect of R) .
If from such a test there does not appear any Subject effect all data might be pooled without correcting for subjects and the overall regression equation then would become simply (with other c and t of course):
S = c + t.Q
from which the common and representative regressions coefficients would emerge.
1. SAS PROC for REG with (implicit) dummy variables?
2. test for significant difference between parallel regression lines
3. SAS PROC for the above
Finally, would you suggest other alternatives?
Thanks in advance,
Regards - Jim.
Y. (Jim) Groeneveld MSc
6825 MJ Arnhem
+31/0 26 376 7365; fax 7305