Date: Tue, 22 Sep 2009 10:23:39 -0500
Reply-To: "Data _null_;" <iebupdte@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Data _null_;" <iebupdte@GMAIL.COM>
Subject: Re: Bootstrap for shrinkage and optimism
Content-Type: text/plain; charset=ISO-8859-1
On 9/22/09, Daniel <firstname.lastname@example.org> wrote:
> this means that I need to have to create a dataset with my original
> data repeated X times, each time with a new value of REPLICATE
METHOD=URS does NOT produce the data the that I think the OP is
requesting. If I understand correctly he wants to replicate the
original data set REP=n times.
Similar to this but with less work.
replicate = index(cats(of in:),'1');
Using URS does not do that produce that same result.
2048 proc surveyselect method=urs rate=1 rep=10 data=sashelp.class
NOTE: The data set WORK.CLASS10 has 124 observations and 7 variables.
On 9/22/09, oloolo <email@example.com> wrote:
> in addition to what DATA _NULL_ said, be sure to use:
> to get a random sample WITH REPLACEMENT
> you can set other values for "rate=", say rate=0.7
> proc surveyselect data=yourdata out=sample
> rate=1 method=urs rep=100;
> On Tue, 22 Sep 2009 10:01:24 -0500, Data _null_; <iebupdte@GMAIL.COM> wrote:
> >Consider a SURVEYSELECT with RATE=1. This is in one of Cassel's paper
> >but you may have missed it.
> >2042 proc surveyselect rate=1 rep=10 data=sashelp.class out=class10;
> >2043 run;
> >NOTE: Under the specified sampling rate, all units will be included in
> >the sample.
> >NOTE: The data set WORK.CLASS10 has 190 observations and 6 variables.
> >On 9/22/09, Daniel <firstname.lastname@example.org> wrote:
> >> Good morning All,
> >> I am developing a predictive model (outcome binary) following the
> >> methodology outlined in "Clinical prediction models" by Steyerberg, or
> >> that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models:
> >> Issues in developing models, evaluating assumtions and adequacy, and
> >> measuring and reducing errors). I am using bootstrap to obtain
> >> measures of shrinkage and optimism to correct my regression
> >> coefficients and goodness of fit (GOF) measures (respectively) for
> >> overfitting. The steps include:
> >> 1. Obtain X bootstrap samples with replacement, of the same size as
> >> the original data
> >> 2. Use each sample to model the outcome using, in our case, a fixed
> >> set of covariates. Get GOF measures of interest
> >> 3. Score the original data with the model obtained in 2. Obtain GOF
> >> measures of interest on the model applied to the original data
> >> ... some additional steps irrelevant to my question
> >> I've used David Cassell's advice to program, in very few lines, steps
> >> 1 and 2, by building a dataset with my X bootstrap samples with
> >> replacement, and then running PROC LOGISTIC with the "BY REPLICATE"
> >> statement.
> >> To score the original data using each of my X models, I used the
> >> OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a
> >> second PROC LOGISTIC, this time with the INEST= option. But for this
> >> to work the way I want, I need to use a "BY REPLICATE" statement and
> >> this means that I need to have to create a dataset with my original
> >> data repeated X times, each time with a new value of REPLICATE. This
> >> allows me to avoid the do loop. The negative aspect (though it might
> >> be mitigated by the efficiency of using the BY statement) is that I
> >> need to create this dataset and depending on the value of X, it can
> >> get quite large. Can you think of other ways this could be done as
> >> efficiently as steps 1 and 2 (perhaps from your own experiences)?
> >> Thank you.
> >> Daniel