Date: Fri, 30 Mar 2007 10:26:54 -0700
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: AW: Re: ZINB/ ZIP
Content-Type: text/plain; charset=iso-8859-1
--- Sebastian Hein <Sebastian.Hein@FORST.BWL.DE> wrote:
> Dear Dale,
> you commented my question on ZIB's: "You have not provided sufficient
> information about your experimental design to evaluate that question.
> With NLMIXED, you could fit either a fixed-effect or random-effect
> ZIP, ZINB, or ZIB model."
> Here are some details:
> The dataset is from a study collegues did appr. 12 y ago:
> In total 1408 branches nested in 75 trees nested in 5 plots. Branches
> have been pruned artificially (cut with saws) to improve timber
> quality. If pruning is done poorly (this occurs mostly with
> increasing diameter of the branch "BD") there is a coloration (branch
> coloration status: 0/1) of the wood, which is a really bad thing for
> the further use of the timber. There is one observation (record) per
> branch. Coloration is a rare event (6.6% of the 1408 cases).
> I ran a logitic model with branch diameter as the only IV.
> Unfortunately: A carefull ceck of y simple binomial model shows a
> poor sensitivity (48.3 %) and a rather high 1-specifity (28.5 %) when
> using the cutpoint at the max. Youden Index.
> Would a ZIB (binomial) offer a solution to decrease the false
> positive error rate? Maybe I am on a completely wrong track.
> PS: I think I am quite through ZIB/ZINB/ZIP topic on SAS-L
Since you are using branch diameter as a predictor variable (your
only predictor variable), then your response must be the
branch-specific discoloration response. That is, you cannot use
as your response the number of discolored branches on each tree
because your predictor variable is specific to the branch value.
Thus, your response is Bernoulli, so you cannot fit a ZIB model
as discussed in my previous posts.
Now, you have branches nested within trees and trees nested within
plots. One would expect there to be within tree correlation for
the response as well as within plot correlations. A random effect
model is clearly called for here, with random tree and random
If you have read the manuscript by Hall, then you should be aware
that zero-inflated models and random effect models both provide
some means of dealing with an overdispersed response. Since a
zero-inflated binomial model is not an option (because your response
is Bernoulli) and you have a natural correlation structure to
your data, you should be fitting a random effects Bernoulli response
model. The NLMIXED procedure is not well suited to fitting nested
random effects models. If you had a design where you had 15 plots
and 5 trees/plot rather than 5 plots and 15 trees/plot, then you
might use NLMIXED to fit the appropriate random effects model.
But with your design, I believe that the appropriate random effects
model could not be fit employing the procedure NLMIXED.
That said, you certainly can fit the specified model employing the
GLIMMIX procedure. You will have to download the GLIMMIX procedure
from the SAS website. The problem with the GLIMMIX procedure is
that it underestimates the random effect variance estimates. If
your model performance is poor because of very large tree-specific
or plot-specific random effects, then the random effects model
estimated by the GLIMMIX procedure will certainly be better than
your fixed effect model. However, it will not be optimal.
In this way, the GLIMMIX procedure offers a mixed hope. In order
to really do much better on your ROC curve, the random effects
must be quite large. However, the GLIMMIX procedure will
underestimate the magnitude of the random effects, and will not
account for as much of the tree to tree or plot to plot variance
as it should. So, you will not obtain as much benefit from a
random effects model fitted employing GLIMMIX as the data really
would offer. You will be better off than when you fit just the
fixed effects model - but you could do better yet.
If you are willing to sacrifice a little on your inference space,
you could fit a model in NLMIXED where you include plot as a fixed
effect rather than a random effect. Then your only random effect
would be at the tree level. Since you have only a few plots in
your study, this is not a bad model to consider. It is not a "wrong"
model in any way. But you are limited to making inferences only
about differences in discoloration across the 5 plots. You cannot
generalize to a plot universe.
Of course, the other thing to consider is whether there are some
other predictors which you have left out of the model. Perhaps
temperature and moisture indexes in the period following pruning
have something to do with discoloration. Or maybe some soil
chemical composition (measured at the plot level?) is an important
predictor. All of the above are measured at the plot level, and
inclusion of plot as a fixed effect could actually account for
left-out plot-specific characteristics. Are there any other
tree-specific or branch-specific characteristics which would affect
discoloration? Rather than looking for exotic models that might
account for poor performance that you have seen with your branch
diameter fixed-effect model, I would strongly advise you to think
about any left-out variables which might improve model performance.
Fred Hutchinson Cancer Research Center
Ph: (206) 667-2926
Fax: (206) 667-5977
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.