Date: Tue, 9 May 2006 17:49:53 -0400
Reply-To: "Luo, Peter" <pluo@DRAFTNET.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Luo, Peter" <pluo@DRAFTNET.COM>
Subject: Re: jackknife concept
Content-Type: text/plain
David, for what Jonas was trying to do, i.e. to get some 'error' estimates
for model predictors, is N sub-samples or N bootstrapping samples the better
method?
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@listserv.vt.edu] On Behalf Of David L
Cassell
Sent: Tuesday, May 09, 2006 2:47 PM
To: SAS-L@LISTSERV.VT.EDU
Subject: Re: jackknife concept
Jonas Bilenas replied:
>I typically will use the bootstrap approach as opposed to hold out samples
>to validate my models. One rule is that if the coefficeints change sign,
>then that variable should be dropped. Here is an example using logistic
>regression. Variable selection (not using stepwise) was buit on entire
>sample. This will be featured in my new book I am working on for SAS
>Press, SAS Applications in Credit Industry.
>
>%macro bootstrap(mod_data,iter,);
> ods listing close;
>
> %do i = 1 %to &iter;
> ods output clear;
> ods output ParameterEstimates=b&i;
> proc logistic data=&mod_data;
> model bad=&ivs_trim;
> where ranuni(0)<=.9;
> run;quit;
> ods output close;
> run;
> proc transpose data=b&i out=bt&i;
> var estimate;
> id variable;
> run;
> %if "&i" ne "1" %then %do;
> proc append base=bt1 data=bt&i;
> run;
> %end;
> %end;
>
> ods listing;
> proc means data=bt1 mean min max std n nmiss;
> run;
>%mend;
>%bootstrap(reg1,20);
>
>Here is truncated OUTPUT:
>The MEANS Procedure
>
>
>Variable Mean Minimum Maximum
>Intercept 0.9560223 0.6456173 1.3784958
>tof24 0.6999331 0.5134410 0.8170087
>cd_util -0.4577382 -0.7089199 -0.2893133
>nhistd3 -0.2086835 -0.3138207 -0.0920624
>nocd 0.7812227 0.5508036 1.1057233
>nodel 0.4298646 0.3049216 0.5502467
>nonpromoinq -0.0532590 -0.0666753 -0.0292599
>ntrades1 0.0432712 0.0239419 0.0573544
>ntrades2 -0.1167981 -0.1399701 -0.0960097
>ntrades2_2 0.0024367 0.0016911 0.0031254
>average_hc_cd_p22 0.1913840 0.1463953 0.2747190
Jonas, I hate to be a pain in the kiester, but...
But I'm going to be one anyway. (Mah nishtanah hahlielah hazeh?
Why is this night different from any other? :-) :-) )
But what you have is a random holdout, but NOT a bootstrap in
the technical sense of the term. It also does not have the theoretical
support that a true bootstrap does.
Here's how I would do a bootstrap for your situation above
(note that I just whipped this up based on your code, and it is
untested).
proc surveyselect data=&MOD_DATA out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;
ods output ParameterEstimates=bout;
proc logistic data=outdata;
by replicate;
model bad=&IVS_TRIM;
run;
ods output close;
ods listing;
proc means data=bout mean min max std n nmiss;
run;
Feel free to use as much or as little of my code as you want. If you
want to use SASFILE to speed up the PROC SURVEYSELECT, do that
as well.
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/