Date: Thu, 1 Aug 2002 14:32:08 -0700
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Zero-Inflated Poisson models
Content-Type: text/plain; charset=us-ascii
This one is for the statisticians. I was asked off-list for any
direction on fitting a zero-inflated poisson (ZIP) model. I had
addressed the topic in a post some time back, but in reviewing that
post I felt there were a number of points which could be improved
on. Specifically, I did not allow the mean model or the zero
inflate probability to be functions of other variables in the data.
Also, I used an identity link for the poisson mean rather than a
log link. That could be computationally dangerous as well as
inefficient. So, just for the record, I wanted to demonstrate
fitting a ZIP model employing NLMIXED, allowing the mean and
zero inflate probability to be functions of some other variables.
I also specify a log link. Below I generate some data and then
show code for fitting the ZIP model.
/* ZIP model data generation */
data test;
/* parameters for zero inflation probability */
bp_0 = -.5;
bp_1 = 0.3;
/* parameters for poisson mean */
bll_0 = .4;
bll_1 = .2;
seed = 1234579;
do i=1 to 1000;
/* generate variables affecting probability and mean eta's */
call ranbin(seed,1,.5,gender);
call rannor(seed,x);
/* probability of zero inflation a function of gender only */
eta_prob = bp_0 + bp_1*gender;
p_0 = exp(eta_prob) / (1 + exp(eta_prob));
call ranbin(seed,1,p_0,zero_inflate);
/* poisson mean a function of x only */
eta_lambda = bll_0 + bll_1*x;
lambda = exp(eta_lambda);
/* Generate response */
if zero_inflate then y=0;
else call ranpoi(seed,lambda,y);
output;
end;
keep y gender x;
run;
/* Now fit the ZIP model employing NLMIXED */
proc nlmixed data=test;
parms bp_0=0 bp_1=0 bll_0=0 bll_1=0;
eta_prob = bp_0 + bp_1*gender + null_2*x;
p_0 = exp(eta_prob)/(1 + exp(eta_prob));
eta_lambda = bll_0 + bll_1*x + null_1*gender;
lambda = exp(eta_lambda);
/* likelihood structure for ZIP. Note that we cannot generate */
/* the log likelihood for the zero response values directly. We */
/* must generate the likelihood and then take the logarithm due */
/* to the additive nature of the zero likelihood. */
if y=0 then prob = p_0 + (1-p_0)*exp(-lambda);
if y=0 then loglike = log(prob);
else loglike = log(1-p_0) + y*log(lambda) -lambda - lgamma(y+1);
/* fit the model */
model y ~ general(loglike);
/* test whether the (1-alpha)*100 CI for the estimated parameters */
/* covers the true parameter values. */
contrast "bp_0=-.5" bp_0+.5;
contrast "bp_1=.3" bp_1-.3;
contrast "bll_0=.4" bll_0-.4;
contrast "bll_1=.2" bll_1-.2;
run;
Note that the model which is fit is overparameterized in that there
is in truth no effect of the variable x on the zero inflate
probability and there is no effect of gender on the poisson mean.
However, I model the eta's for both the probability and the mean
as functions of both gender and x. The Parameter Estimates table
tests these two effects against their true (0) values. There is no
need to specify contrasts for these parameters. If you run the above
simulation, you will observe that a 95% confidence interval covers
the true parameter values for all of the estimates. Convergence
occurs quite readily.
Hope that this is of interest to some.
Dale
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com