Date: Tue, 23 Mar 2010 08:54:13 -0700
Reply-To: Shawn Haskell <shawn.haskell@STATE.VT.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Shawn Haskell <shawn.haskell@STATE.VT.US>
Organization: http://groups.google.com
Subject: Re: Model selection in Proc Mixed
Content-Type: text/plain; charset=ISO-8859-1
On Mar 22, 1:45 pm, wyldsoul <wylds...@gmail.com> wrote:
> Hello,
> I have a dataset and a set of apriori models and I am going to use
> model selection and AIC to rank the models. My models have fixed and
> random effects. I have two random class variables, year and unit, and
> a suite of continuous variables. Below is a simplified sample
> dataset. One thing I have to consider is that some, but not all
> experimental units were sampled each year.
> From research with SAS so far, I have found that the default
> estimator used in proc mixed is REML, and that REML only considers the
> random effects. Since the formula that calculates each AIC value
> includes a bias correction term based on the number of parameters, it
> seems that the REML method would be inappropriate for models including
> fixed effects. In order to consider the fixed effects, I need to
> specify the ML method. I have found that the ML method counts each
> unique observation in a class variable as a separate parameter. For
> example each year is counted as a separate parameter in the model.
> This would seem to inflate the bias correction term for AIC, as it
> uses the number of parameters for the calculation. I would welcome
> any suggestions on the best way to proceed with this analysis. I am
> wondering whether or not SAS is the best environment to perform model
> selection, and I plan on calculating AIC values manually as a check.
> Any recommendations or insight on how best to proceed with this
> analysis are welcome.
>
> Thanks
>
> y year unit x3 x4 x5 x6
> 43 2005 A 23 37 19 7
> 34 2005 B 14 48 28 31
> 50 2005 C 19 24 48 48
> 4 2005 D 47 9 46 20
> 28 2005 E 37 36 6 12
> 7 2005 F 9 27 22 19
> 40 2005 G 31 9 15 32
> 45 2006 A 17 4 29 6
> 24 2006 C 29 23 7 38
> 37 2006 D 9 26 34 32
> 18 2006 F 11 45 50 18
> 18 2006 G 27 10 16 42
> 17 2007 B 6 34 7 29
> 49 2007 C 14 2 17 26
> 27 2007 D 12 13 31 46
> 18 2007 E 4 22 46 44
> 28 2007 F 50 45 5 16
> 5 2007 G 47 23 16 16
> 22 2007 H 29 5 29 36
> 40 2007 I 9 45 15 32
I'm no expert here on Proc MIXED, but it seems like you are taking the
right approach. Yes, each level of a class variable, minus one,
should be used as a parameter (K) to estimate AIC (-2LL + 2K) from
your ML or LL output, and then AICc that has a further correction to
prevent overfitting models to data. I think you should calculate your
own AICc values in Excel or whatever other program you use - don't
just trust SAS to give you what you think you are getting.
I recall that the bigger issue i had with Proc MIXED (or PHREG) was
with the estimate of sample size used for calculating AIC. At least
with family-group data in PHREG, i recall that I was not satisfied
with what SAS estimated as a sample size - I thought it was too
liberal - given those semiparametric and partial-likelihood models, I
used a conservative estimate of sample size as the number of mortality
events. Maybe my memory fails me for Proc MIXED - can someone epxlain
how sample size is calcualted in Proc MIXED? Is it adequately
parsimonius? i think I used the number of individual animals as an
estimate of sample size for calcuating AICc from ML (or LL) given by
Proc MIXED. thanks. Shawn
|