Date: Tue, 22 Sep 2009 10:43:29 -0500
Reply-To: "Data _null_;" <iebupdte@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Data _null_;" <iebupdte@GMAIL.COM>
Subject: Re: Bootstrap for shrinkage and optimism
In-Reply-To: <200909221536.n8MApesE004764@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
No read it again
> this means that I need to have to create a dataset with my original
> data repeated X times, each time with a new value of REPLICATE
On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote:
> add one more option: OUTHITS
> otherwise multiple replicated records will be collapsed into one
> besides, for Bootstrap analysis, OP needs to sample WITH REPLACEMENT, not
> WITHOUT REPLACEMENT
>
> **********************;
> ods select none;
> proc surveyselect data=sashelp.class out=class100
> rate=1 method=urs rep=100 outhits;
> run;
> ods select all;
> **********************;
>
>
> On Tue, 22 Sep 2009 10:23:39 -0500, Data _null_; <iebupdte@GMAIL.COM> wrote:
>
> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
> >> this means that I need to have to create a dataset with my original
> >> data repeated X times, each time with a new value of REPLICATE
> >
> >METHOD=URS does NOT produce the data the that I think the OP is
> >requesting. If I understand correctly he wants to replicate the
> >original data set REP=n times.
> >
> >Similar to this but with less work.
> >
> >data class10;
> > set
> > sashelp.class(in=in1 )
> > sashelp.class(in=in2 )
> > sashelp.class(in=in3 )
> > sashelp.class(in=in4 )
> > sashelp.class(in=in5 )
> > sashelp.class(in=in6 )
> > sashelp.class(in=in7 )
> > sashelp.class(in=in8 )
> > sashelp.class(in=in9 )
> > sashelp.class(in=in10) open=defer;
> > replicate = index(cats(of in:),'1');
> > run;
> >
> >
> >Using URS does not do that produce that same result.
> >
> >2048 proc surveyselect method=urs rate=1 rep=10 data=sashelp.class
> >out=class10;
> >2049 run;
> >
> >NOTE: The data set WORK.CLASS10 has 124 observations and 7 variables.
> >
> >
> >On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote:
> >> in addition to what DATA _NULL_ said, be sure to use:
> >> method=urs
> >> to get a random sample WITH REPLACEMENT
> >> you can set other values for "rate=", say rate=0.7
> >>
> >> proc surveyselect data=yourdata out=sample
> >> rate=1 method=urs rep=100;
> >> run;
> >>
> >> On Tue, 22 Sep 2009 10:01:24 -0500, Data _null_; <iebupdte@GMAIL.COM>
> wrote:
> >>
> >> >Consider a SURVEYSELECT with RATE=1. This is in one of Cassel's paper
> >> >but you may have missed it.
> >> >
> >> >2042 proc surveyselect rate=1 rep=10 data=sashelp.class out=class10;
> >> >2043 run;
> >> >
> >> >NOTE: Under the specified sampling rate, all units will be included in
> >> >the sample.
> >> >NOTE: The data set WORK.CLASS10 has 190 observations and 6 variables.
> >> >
> >> >
> >> >
> >> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
> >> >> Good morning All,
> >> >>
> >> >> I am developing a predictive model (outcome binary) following the
> >> >> methodology outlined in "Clinical prediction models" by Steyerberg, or
> >> >> that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models:
> >> >> Issues in developing models, evaluating assumtions and adequacy, and
> >> >> measuring and reducing errors). I am using bootstrap to obtain
> >> >> measures of shrinkage and optimism to correct my regression
> >> >> coefficients and goodness of fit (GOF) measures (respectively) for
> >> >> overfitting. The steps include:
> >> >>
> >> >> 1. Obtain X bootstrap samples with replacement, of the same size as
> >> >> the original data
> >> >> 2. Use each sample to model the outcome using, in our case, a fixed
> >> >> set of covariates. Get GOF measures of interest
> >> >> 3. Score the original data with the model obtained in 2. Obtain GOF
> >> >> measures of interest on the model applied to the original data
> >> >> ... some additional steps irrelevant to my question
> >> >>
> >> >> I've used David Cassell's advice to program, in very few lines, steps
> >> >> 1 and 2, by building a dataset with my X bootstrap samples with
> >> >> replacement, and then running PROC LOGISTIC with the "BY REPLICATE"
> >> >> statement.
> >> >>
> >> >> To score the original data using each of my X models, I used the
> >> >> OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a
> >> >> second PROC LOGISTIC, this time with the INEST= option. But for this
> >> >> to work the way I want, I need to use a "BY REPLICATE" statement and
> >> >> this means that I need to have to create a dataset with my original
> >> >> data repeated X times, each time with a new value of REPLICATE. This
> >> >> allows me to avoid the do loop. The negative aspect (though it might
> >> >> be mitigated by the efficiency of using the BY statement) is that I
> >> >> need to create this dataset and depending on the value of X, it can
> >> >> get quite large. Can you think of other ways this could be done as
> >> >> efficiently as steps 1 and 2 (perhaps from your own experiences)?
> >> >>
> >> >> Thank you.
> >> >>
> >> >> Daniel
> >> >>
> >>
>
|