Date: Thu, 28 Dec 2006 18:04:54 -0800
Reply-To: Cyclotron <dramitgupta@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Cyclotron <dramitgupta@GMAIL.COM>
Organization: http://groups.google.com
Subject: Comparing two ROC curves after bootstrap
Content-Type: text/plain; charset="iso-8859-1"
I am trying to compare the bootstrap corrected (200 replicates)
c-statistics of two logistic regression models (one without and one
with a marker)
******************************************************************************
The code I am using is :
/*CREATING THE BOOTSTRAP REPLICATES*/
proc surveyselect data=sasuser.hgfexp out=boot seed=150
method=urs samprate=1
rep=200;
ID outcome age sex race marker;
run;
ods listing close;
/*BASE MODEL: Logistic regression on the bootstrap replicates*/
proc logistic data=boot;
model outcome = age sex race;
ods output association=out1;
by replicate;
run;
ods listing;
/*Keeping only the c-statistic*/
data out1(keep=c);
set out1;
if label2="c";
c=nvalue2;
run;
/*MODEL with marker: Logistic regression on the bootstrap replicates*/
ods listing close;
proc logistic data=boot;
model outcome = marker age sex race;
ods output association=out2;
by replicate;
run;
ods listing;
data out2(keep=c);
set out2;
if label2="c";
c=nvalue2; run;
/*Changing the name of c (from the second model) to cmarker*/
data out2;
set out2;
cmarker = c;
drop c;
run;
data comp;
merge out2 out1; run;
/*Calculating the difference between the c-statistics*/
data comp;
set comp;
diff = cmarker - c;
run;
/*T test for the hypothesis that the difference between the
c-statistics is zero*/
proc ttest data = comp;
var diff;
run;
******************************************************************************
When I run the above model the difference between the c-statistics is
like 0.1 but it is statistically significant using the t-test.
This does not feel right. Also the T-test is being done on a database
with 200 values for the difference between the two statistics and thus
will always be significant because of the sample size. When I run the
same analysis with less number of replicates (50) then the t test is
not significant as the standard error is higher.
What is the correct way to compare the two c-statistics here?
Any help will be greatly appreciated.
Thanks
Amit
PS: I looked at SUGI27: Paper 248-27, Use of the ROC curve and the
bootstrap in comparing weighted logistic regression models. In that
paper in the end this is how they calculated the difference between the
curves i.e. by running a t-test. Am I mis-interpreting the paper?
Should the df for the t-test be different?