Date: Wed, 9 Jan 2008 09:33:28 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: AIC mystery in MIXED
In-Reply-To: <200801091554.m09Bl6BV027598@malibu.cc.uga.edu>
Content-Type: text/plain; charset=iso-8859-1
--- Ryan Utz <rutz@AL.UMCES.EDU> wrote:
> Hi all,
>
> I'm having issues using/interpreting AIC scores in proc MIXED. I'm
> trying
> to compare simple linear relationships with power function
> relationships
> (both models have been shown to be consistently valid in related
> datasets).
> When I go to interpret AIC (or AICc, etc) scores, however, power
> relationships always emerge as the better model, even when it clearly
> isn't
> the case. As an example, I provided my actual data for an extremely
> simple
> model at the bottom of this email (I'm testing much more complex
> models, but
> the example below illustrates the problem). To test the power
> relationship,
> I've log-transformed both X and Y. Running the code below shows that
> MIXED
> suggests the power relationship is better (it has a lower AIC score),
> but if
> you run a simple linear regression, clearly the non-transformed data
> (thus a
> linear relationship) is superior. This is true even when both models
> have
> the exact same number of parameters.
>
> Is there something I'm doing wrong here, either in execution or
> interpretation? I'd like to use AIC scores to help choose a model,
> but
> because of this issue I'm vary hesitant.
>
> Thanks ahead of time for any advice,
>
> Ryan Utz
> University of Maryland Center for Environmental Science
>
>
> data test;
> input density length; cards;
> 0.099266504 82.8125
> 0.048193642 85.05405405
> 0.114893617 84.34210526
> 0.257685811 70.515625
> 0.044660194 86.92857143
> 0.244736842 76.37647059
> 0.020619946 89.5
> 0.058555133 93.6
> 0.125817923 84.08888889
>
> data test2; set test;
> lndensity = log(density);
> lnlength= log (length); run;
>
> title Linear Relationship;
> proc mixed data=test2;
> model length=density; run;
>
> title Power Relationship;
> proc mixed data=test2;
> model lnlength=lndensity; run;
>
> /*Simple regression for comparison*/
>
> Title Linear relationship-simple regression;
> proc glm data=test2;
> model length=density; run;
>
> Title Linear relationship-Power function;
> proc glm data=test2;
> model lnlength=lndensity; run;
>
Ryan,
A couple of comments. AIC can be used to compare models only when
the same response is employed. In these data, you are using first
length and then lnlength as response variables. You just cannot
use a likelihood-based statistic to compare across different response
variables.
Now, a couple of other notes about the use of AIC to compare models.
First, the use of any likelihood-based statistic is restricted to
comparisons where exactly the same response VALUES are employed in
all models. This means not that not only must the response variable
be the same for all models but also that there are no missing values
for any predictor variables which affect the number of observations
employed to fit different models. This has no bearing on the data
that you presented, but I think it is important to make clear what
the requirements are for comparing models employing an AIC criterion.
Second, when AIC is employed to assess which model fixed effects are
most reasonable, then maximum likelihood estimation must be employed
rather than restricted maximum likelihood estimation. Thus, if you
had lnlength as your response and you wanted to assess whether
density or lndensity was the better predictor, then you would want
to fit the two models as shown below:
proc mixed data=test2 METHOD=ML;
model lnlength=lndensity;
run;
proc mixed data=test2 METHOD=ML;
model lnlength=lndensity;
run;
With respect to your comment that the model employing "non-transformed
data (thus a linear relationship) is superior", I certainly cannot
tell that from the data that you showed. For either transformation,
the same two observations present as model outliers. Such a claim
might be supported in a larger data set. And it may be that the
non-transformed data is slightly better here, but not so much better
as to be readily apparent.
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ