```Date: Thu, 20 Jun 2002 09:42:41 +0100 Reply-To: david.mcnulty@QUESTINTL.COM Sender: "SAS(r) Discussion" From: david.mcnulty@QUESTINTL.COM Subject: Re: Untenable results from PROC GENMOD Comments: To: Dale McLerran Content-type: text/plain; charset=us-ascii Good Morning Dale, Thankyou for a very comprehensive reply. I can also use the template bootstrapping data step for another problem that has been sitting on the shelf. Dave. Dale McLerran SAS-L@LISTSERV.UGA.EDU cc: 19/06/2002 18:44 Subject: Re: Untenable results from PROC GENMOD David, That does change the picture substantially! However, I do believe that there is still be a problem with including subject as a covariate. My guess is that there subjects who rate all 7 products employed in the analysis as a success/failure under both conditions. A logit cannot be computed for those subjects. I dummied up some data which demonstrates the situation that you report. data binary; do phase=1 to 2; beta1 = .75*(phase=1); do product=1 to 7; beta2 = .25*rannor(1234579); do subject=1 to 29; gamma = .5*rannor(1234579); eta = gamma + beta1 + beta2; p = exp(eta)/(1+exp(eta)); y = ranbin(1234579,1,p); output; end; /* Subject 30 will have all failures */ subject=30; gamma = -10; eta = gamma + beta1 + beta2; p = exp(eta)/(1+exp(eta)); y = ranbin(1234579,1,p); output; subject=31; gamma = .5*rannor(1234579); eta = gamma + beta1 + beta2; p = exp(eta)/(1+exp(eta)); y = ranbin(1234579,1,p); output; end; end; run; proc freq data=binary(where=(subject>=30)); tables y*subject; run; proc genmod data=binary; class subject phase product; model y = subject phase|product / dist=bin; lsmeans phase|product; run; Because subject 30 has all failures, inclusion of the subject effect in the model causes the subject 30 logit to be inestimable. When the point estimate for subject 30 went above 25, the change in model likelihood was sufficiently small for GENMOD to state that convergence was achieved. At this point, the variance for subject 30 was very large, which adversely affects the variance of the least squares means. In fact, the point estimate and variance for subject 30 should both be set to infinity, but that really cannot be done. Note that if you employed PROC LOGISTIC to fit the model, you would have gotten a statement that quasi-complete separation of the data points occurred and that model fit was questionable. So, the question is what can you do at this point? Some might suggest a GEE analysis. GEE's take into account the nonindependenc of the responses among subjects who have multiple observations. This is what you are attempting to do when you include subject as a covariate. But GEE's do not require that you obtain a subject effect estimate. The syntax below fits a GEE to the data generated above. proc genmod data=binary; class subject phase product; model y = phase|product / dist=bin; repeated subject=subject; lsmeans phase|product; run; GEE requires large samples for inference to be valid. Sample size refers to the number of independent observations which is the number of subjects employed in the analysis. For a very simple model, N=31 would be sufficient to support a GEE. For a model in which you estimate 14 parameters, N=31 is not sufficient. Another approach which one might consider is a random effects logit model. Random effects logits can be fit employing the procedure NLMIXED or employing the GLIMMIX macro which ships with SAS. However, the random effects logit conditions on subject, just as you did in the analysis which you presented. Conditioning on subject is what got you into trouble in the first place. So we will have to forget the random effects (any conditional) model and confine our attention to what are referred to as marginal models. There is one marginal model which I could heartily recommend for your present situation. The model does not rely on large sample theory for its validity. Rather, it relies on randomization theory. What I would urge you to consider is the use of a bootstrap regression approach. In the bootstrap analysis, we randomly pick with replacement Ni=31 subjects from the set of N=31 subjects. Some subjects are selected several times and some subjects are not selected for the bootstrap sample. Fit the logistic regression to the bootstrap sample and output the statistics of interest. In your case, you would output the LSMEANS and convert them to probability estimates. Draw a new sample of 31 subjects and fit the model again. Do this K=1000 times. Now, you can compute mean and variance for each probability value arising from your model. You can also compute a 95% confidence interval directly from the 1000*alpha/2 and 1001-1000*alpha/2 (25th and 976th) ordered statistics. The bootstrap does require approximately equal sample sizes for each subject. In your case, this holds perfectly. The bootstrap does not require large sample theory for its validity, so it is preferable over the GEE. You might want to look at our paper "A Comparison of Statistical Methods for Clustered Data Analysis with Gaussian Error". Feng, Z., McLerran, D, and Grizzle, J. Statistics in Medicine, 15, 1793-1806 (1996). While that paper discusses statistical models for gaussian data, the results hold for binary response as well. We performed the simulation to examine the various statistical models in the binary case, but we were letting a visiting faculty member take the lead on writing up results for the binary case. The paper never got written. You could fit a bootstrap model employing code something like the following (note that I have not tested the datastep below): data bootstrap; set mydata end=lastrec; by subject; array response {31,16} _temporary_; position = 8*(phase-1) + product; /* For phase coded 1,2 */ /* Use 8*phase for phase=0,1 */ response{subject,position} = correct; if lastrec then do; do bootstrap=1 to 1000; do i=1 to 31; subject = ceil(31*ranuni(seed)); /* Specify SEED */ do j=1 to 16; correct = response{subject,j}; phase = 1 + (j>8); product = j - 8*(phase-1); output; end; end; end; end; run; ods listing close; ods output lsmeans=lsmeans; proc genmod data=bootstrap; by bootstrap; model correct = phase|product / dist=bin; lsmeans phase|product; run; At this point, I am too lazy to figure out just what the LSMEANS dataset will look like. But you will want to convert the LSMEANS to probability estimates in a datastep and then run PROC UNIVARIATE against the probability estimates, requesting the 2.5th and 97.5th percentiles of the distributions along with mean and variance. Dale --- david.mcnulty@QUESTINTL.COM wrote: > Hi Dale, > > My apologies I have inadvertently mislead you. > > Further Info > > All 31 subjects test 8 products under condition 1 (scoring 1=success > 0=fail), has a break and re-tests all 8 products under condition 2. > Conditions 1 and 2 cannot be randomised. Total observations = 31*8*2 > = 496 > (434 ignoring Prod_8). > > Dave. > ===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@fhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com IMPORTANT NOTICE: This email is confidential, may be legally privileged, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone else is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. ```

Back to: Top of message | Previous page | Main SAS-L page