LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2009, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 22 Sep 2009 10:43:29 -0500
Reply-To:     "Data _null_;" <iebupdte@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Data _null_;" <iebupdte@GMAIL.COM>
Subject:      Re: Bootstrap for shrinkage and optimism
Comments: To: oloolo <dynamicpanel@yahoo.com>
In-Reply-To:  <200909221536.n8MApesE004764@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1

No read it again

> this means that I need to have to create a dataset with my original > data repeated X times, each time with a new value of REPLICATE

On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote: > add one more option: OUTHITS > otherwise multiple replicated records will be collapsed into one > besides, for Bootstrap analysis, OP needs to sample WITH REPLACEMENT, not > WITHOUT REPLACEMENT > > **********************; > ods select none; > proc surveyselect data=sashelp.class out=class100 > rate=1 method=urs rep=100 outhits; > run; > ods select all; > **********************; > > > On Tue, 22 Sep 2009 10:23:39 -0500, Data _null_; <iebupdte@GMAIL.COM> wrote: > > >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote: > >> this means that I need to have to create a dataset with my original > >> data repeated X times, each time with a new value of REPLICATE > > > >METHOD=URS does NOT produce the data the that I think the OP is > >requesting. If I understand correctly he wants to replicate the > >original data set REP=n times. > > > >Similar to this but with less work. > > > >data class10; > > set > > sashelp.class(in=in1 ) > > sashelp.class(in=in2 ) > > sashelp.class(in=in3 ) > > sashelp.class(in=in4 ) > > sashelp.class(in=in5 ) > > sashelp.class(in=in6 ) > > sashelp.class(in=in7 ) > > sashelp.class(in=in8 ) > > sashelp.class(in=in9 ) > > sashelp.class(in=in10) open=defer; > > replicate = index(cats(of in:),'1'); > > run; > > > > > >Using URS does not do that produce that same result. > > > >2048 proc surveyselect method=urs rate=1 rep=10 data=sashelp.class > >out=class10; > >2049 run; > > > >NOTE: The data set WORK.CLASS10 has 124 observations and 7 variables. > > > > > >On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote: > >> in addition to what DATA _NULL_ said, be sure to use: > >> method=urs > >> to get a random sample WITH REPLACEMENT > >> you can set other values for "rate=", say rate=0.7 > >> > >> proc surveyselect data=yourdata out=sample > >> rate=1 method=urs rep=100; > >> run; > >> > >> On Tue, 22 Sep 2009 10:01:24 -0500, Data _null_; <iebupdte@GMAIL.COM> > wrote: > >> > >> >Consider a SURVEYSELECT with RATE=1. This is in one of Cassel's paper > >> >but you may have missed it. > >> > > >> >2042 proc surveyselect rate=1 rep=10 data=sashelp.class out=class10; > >> >2043 run; > >> > > >> >NOTE: Under the specified sampling rate, all units will be included in > >> >the sample. > >> >NOTE: The data set WORK.CLASS10 has 190 observations and 6 variables. > >> > > >> > > >> > > >> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote: > >> >> Good morning All, > >> >> > >> >> I am developing a predictive model (outcome binary) following the > >> >> methodology outlined in "Clinical prediction models" by Steyerberg, or > >> >> that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models: > >> >> Issues in developing models, evaluating assumtions and adequacy, and > >> >> measuring and reducing errors). I am using bootstrap to obtain > >> >> measures of shrinkage and optimism to correct my regression > >> >> coefficients and goodness of fit (GOF) measures (respectively) for > >> >> overfitting. The steps include: > >> >> > >> >> 1. Obtain X bootstrap samples with replacement, of the same size as > >> >> the original data > >> >> 2. Use each sample to model the outcome using, in our case, a fixed > >> >> set of covariates. Get GOF measures of interest > >> >> 3. Score the original data with the model obtained in 2. Obtain GOF > >> >> measures of interest on the model applied to the original data > >> >> ... some additional steps irrelevant to my question > >> >> > >> >> I've used David Cassell's advice to program, in very few lines, steps > >> >> 1 and 2, by building a dataset with my X bootstrap samples with > >> >> replacement, and then running PROC LOGISTIC with the "BY REPLICATE" > >> >> statement. > >> >> > >> >> To score the original data using each of my X models, I used the > >> >> OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a > >> >> second PROC LOGISTIC, this time with the INEST= option. But for this > >> >> to work the way I want, I need to use a "BY REPLICATE" statement and > >> >> this means that I need to have to create a dataset with my original > >> >> data repeated X times, each time with a new value of REPLICATE. This > >> >> allows me to avoid the do loop. The negative aspect (though it might > >> >> be mitigated by the efficiency of using the BY statement) is that I > >> >> need to create this dataset and depending on the value of X, it can > >> >> get quite large. Can you think of other ways this could be done as > >> >> efficiently as steps 1 and 2 (perhaps from your own experiences)? > >> >> > >> >> Thank you. > >> >> > >> >> Daniel > >> >> > >> >


Back to: Top of message | Previous page | Main SAS-L page