Date: Tue, 22 Sep 2009 10:23:39 -0500
Reply-To: "Data _null_;" <iebupdte@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Data _null_;" <iebupdte@GMAIL.COM>
Subject: Re: Bootstrap for shrinkage and optimism
In-Reply-To: <200909221507.n8MAlMjE004632@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
> this means that I need to have to create a dataset with my original
> data repeated X times, each time with a new value of REPLICATE
METHOD=URS does NOT produce the data the that I think the OP is
requesting. If I understand correctly he wants to replicate the
original data set REP=n times.
Similar to this but with less work.
data class10;
set
sashelp.class(in=in1 )
sashelp.class(in=in2 )
sashelp.class(in=in3 )
sashelp.class(in=in4 )
sashelp.class(in=in5 )
sashelp.class(in=in6 )
sashelp.class(in=in7 )
sashelp.class(in=in8 )
sashelp.class(in=in9 )
sashelp.class(in=in10) open=defer;
replicate = index(cats(of in:),'1');
run;
Using URS does not do that produce that same result.
2048 proc surveyselect method=urs rate=1 rep=10 data=sashelp.class
out=class10;
2049 run;
NOTE: The data set WORK.CLASS10 has 124 observations and 7 variables.
On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote:
> in addition to what DATA _NULL_ said, be sure to use:
> method=urs
> to get a random sample WITH REPLACEMENT
> you can set other values for "rate=", say rate=0.7
>
> proc surveyselect data=yourdata out=sample
> rate=1 method=urs rep=100;
> run;
>
> On Tue, 22 Sep 2009 10:01:24 -0500, Data _null_; <iebupdte@GMAIL.COM> wrote:
>
> >Consider a SURVEYSELECT with RATE=1. This is in one of Cassel's paper
> >but you may have missed it.
> >
> >2042 proc surveyselect rate=1 rep=10 data=sashelp.class out=class10;
> >2043 run;
> >
> >NOTE: Under the specified sampling rate, all units will be included in
> >the sample.
> >NOTE: The data set WORK.CLASS10 has 190 observations and 6 variables.
> >
> >
> >
> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
> >> Good morning All,
> >>
> >> I am developing a predictive model (outcome binary) following the
> >> methodology outlined in "Clinical prediction models" by Steyerberg, or
> >> that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models:
> >> Issues in developing models, evaluating assumtions and adequacy, and
> >> measuring and reducing errors). I am using bootstrap to obtain
> >> measures of shrinkage and optimism to correct my regression
> >> coefficients and goodness of fit (GOF) measures (respectively) for
> >> overfitting. The steps include:
> >>
> >> 1. Obtain X bootstrap samples with replacement, of the same size as
> >> the original data
> >> 2. Use each sample to model the outcome using, in our case, a fixed
> >> set of covariates. Get GOF measures of interest
> >> 3. Score the original data with the model obtained in 2. Obtain GOF
> >> measures of interest on the model applied to the original data
> >> ... some additional steps irrelevant to my question
> >>
> >> I've used David Cassell's advice to program, in very few lines, steps
> >> 1 and 2, by building a dataset with my X bootstrap samples with
> >> replacement, and then running PROC LOGISTIC with the "BY REPLICATE"
> >> statement.
> >>
> >> To score the original data using each of my X models, I used the
> >> OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a
> >> second PROC LOGISTIC, this time with the INEST= option. But for this
> >> to work the way I want, I need to use a "BY REPLICATE" statement and
> >> this means that I need to have to create a dataset with my original
> >> data repeated X times, each time with a new value of REPLICATE. This
> >> allows me to avoid the do loop. The negative aspect (though it might
> >> be mitigated by the efficiency of using the BY statement) is that I
> >> need to create this dataset and depending on the value of X, it can
> >> get quite large. Can you think of other ways this could be done as
> >> efficiently as steps 1 and 2 (perhaps from your own experiences)?
> >>
> >> Thank you.
> >>
> >> Daniel
> >>
>
|