Date: Mon, 27 Jun 2005 17:32:38 -0500
Reply-To: Jeffrey Stratford <stratja@AUBURN.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jeffrey Stratford <stratja@AUBURN.EDU>
Subject: Re: testing a holdout sample
Content-Type: text/plain; charset=US-ASCII
David,
Hi. Thanks for the response. I actually used PROC GENMOD (dist=Poisson
lrci dscale) to get the betas and the CI's. I'm not familiar enough
with NLMIXED to see what was being done although I should check it out.
It seems like it should be necessary for me to input the parameters. Is
there a way to do this in genmod?
Thanks,
Jeff
****************************************
Jeffrey A. Stratford, Ph.D.
Postdoctoral Associate
331 Funchess Hall
Department of Biological Sciences
Auburn University
Auburn, AL 36849
334-329-9198
FAX 334-844-9234
http://www.auburn.edu/~stratja
****************************************
>>> "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV> 06/27/05 4:52 PM
>>>
Jeffrey Stratford <stratja@AUBURN.EDU> wrote:
> I have several Poisson models that I would like to test with a holdout
> sample. I have the betas and their 95% CI (calculated with the LRCI
> option). What I would like to do is create a predicted Y with 95% CI
> for each sample to see if my observed samples falls within that
> interval. Is there a way to get SAS to do this for each sample? I
can
> get the means easily (exp(B0+B1x+B2x...)), I'm just not sure how to
> create the CIs for each sample.
I'm going to assume you did this work in NLMIXED. That's a
reasonable choice here. And it has a PREDICT statement that will
simplify your life.
All you do is append your holdout sample onto the end of your
working data set. Since your dependent variable is missing
in your holdout sample, these records won't get used in the
fitting process. And you don't have to grapple with the
complexities that can be introduced, such as empirical Bayes
estimates of random effects. Standard errors get computed for
you, using the delta method.
Here's a SAS data set stolen from the NLMIXED documentation:
data rats;
input trt$ m x;
if (trt='c') then do; x1 = 1; x2 = 0; end;
else do; x1 = 0; x2 = 1; end;
litter = _n_;
datalines;
c 13 13
c 12 12
c 9 9
c 9 9
c 8 8
c 8 8
c 13 12
c 12 11
c 10 9
c 10 9
c 9 8
c 13 11
c 5 4
c 7 5
c 10 7
c 10 7
t 12 12
t 11 11
t 10 10
t 9 9
t 11 10
t 10 9
t 10 9
t 9 8
t 9 8
t 5 4
t 9 7
t 7 4
t 10 5
t 6 3
t 10 3
t 7 0
run;
Now we'll make up a holdout sample, with missing values for our
dependent variable X, and then append that to the RATS data set
above:
data temp1;
trt = 'c';
x1 = 1;
x2 = 0;
x = .;
do litter = 33 to 36; m = int(litter/4); output; end;
run;
data rats2;
set rats temp1;
run;
So the holdout sample will be part of the data set we feed to
NLMIXED, and it will only be used to build CIs:
proc nlmixed data=rats2;
parms t1=1 t2=1 s1=.05 s2=1;
eta = x1*t1 + x2*t2 + alpha;
p = probnorm(eta);
model x ~ binomial(m,p);
random alpha ~ normal(0,x1*s1*s1+x2*s2*s2) subject=litter;
estimate 'gamma2' t2/sqrt(1+s2*s2);
predict p out=outp alpha=0.01;
run;
proc print noobs data=outp(where=(x=.)); run;
Now look at the PROC PRINT output, and you'll see that you
have LOWER and UPPER confidence bounds in the output data set,
already computed using the ALPHA= value we specified in the
PREDICT statement above.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician