Date: Wed, 4 Jan 2006 21:52:53 -0500
Reply-To: Jay Weedon <jweedon@EARTHLINK.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jay Weedon <jweedon@EARTHLINK.NET>
Organization: http://newsguy.com
Subject: Re: Using Sub in Random statement of Proc Mixed
Content-Type: text/plain; charset=us-ascii
On Wed, 4 Jan 2006 11:48:46 -0800, davidlcassell@MSN.COM (David L
Cassell) wrote:
>excel_hari@YAHOO.COM wrote back:
>>Jay,
>>
>>First of all thanks a TON for replying to my doubts. It certainly took
>>time for me to digest what you have explained above. And finally I
>>could see "subject" at the end of the tunnel.
>>
>> >1 makes perfect sense, because computationally an intercept is the
>> >regression coefficient attached to a predictor that always has value
>> > 1, e.g., in the model
>>
>> >E(Y) = b1*1 + b2*X
>>
>>The above is certainly a revelation to me. Once I was able to "accept"
>>this all the doubts in my previous posts seem to melt away (or it did
>>seem to!!). I have something to ask you. This particular idea of "1"
>>being the predictor variable associated with Intercept is a "new
>>concept" to me. I have been through some of the tutorials in web about
>>regression etc but not able to understand as to why in none of those
>>tutorials, this way of looking at things have not been mentioned. I
>>have also been through David Levine's Business statistics and some
>>other book(s) but no where I have come across this. I would be grateful
>>if you could pass on a web resource in which I could read a little more
>>about this representation.
>
>Any website which discusses regression in terms of matrix representation
>will
>cover this, although it may not be clear that is what they are saying. But
>you
>said you didn't want to get into the matrix aspect (even though I think it
>is
>way easier to understand regression and general linear models when looking
>at it as matrix manipulation).
>
>If you write out the design matrix for your simple linear regression above,
>it
>has 2 columns. The first column is all '1's, and the second column is your
>X's.
>So you have n equations (i=1,...,n), each of which looks like:
>
> Yi = a*1 + b*Xi + ei
>
>The expected value of Y is the constant part, since ei has mean zero. So
>you
>have Jay's formula above.
For sure, the 1 pops right out when you frame the problem in terms of
specifying the "design" matrix. You (Hari) would be well advised to
learn this approach - e.g., the SAS documentation about /sub in the
random statement explains quite concisely how it works, but it's
phrased in terms of matrices, so if you can't follow that line of
argument you're at a disadvantage.
Most texts on linear regression, e.g., Draper & Smith, will show how
matrix representation works with linear models. The Wikipedia article
http://en.wikipedia.org/wiki/Linear_regression mentions that the first
column of the design matrix contains n 1's, but if you don't know even
what matrices are you'll need to do the prerequisite study, perhaps an
undergraduate course in linear algebra.
>>Ok a question about SAS implementation/design of subject (and related)
>>option in Proc mixed.
>>a) (probaby a stupid question) In the random statement we write
>>predictors and not the coefficients associated with predictors. So in
>>the case of "intercept" why are we writing the coefficient (which is
>>intercept) itself rather than writing 1. Doesnt it amount to
>>incosistent representation?
>
>Not a stupid question. But what's the difference? INTERCEPT is always
>clearer. And, if you want something more complex, say a random intercept
>in a mixed model, you wouldn't have a single column of '1's in a single
>matrix.
>So the keyword seems more useful and less likely to be confused for
>something
>else in the code.
I agree with Hari that at least in the case of the fixed intercept
this does represent a logical inconsistency. David's point is also
well made, and in any case, people who work with statistical software
are used to this terminology.
Jay
|