LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 1999)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 8 Jul 1999 20:45:27 +0200
Reply-To:     Rolf Kjoeller <rolf.kjoeller@GET2NET.DK>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Rolf Kjoeller <rolf.kjoeller@GET2NET.DK>
Subject:      Sv:      Re: Looping syntax containing "GET FILE" commands
Comments: To: bauer@SPSS.COM
Content-Type: text/plain; charset="iso-8859-1"

Sorry for the delayed response; I haven't been around my computer today. As John Bauer mentions I have been doing some bootstrapping, and, with good help from Fabrizio Arosio, I have written a script that helps with some of the steps.

Brian, I normally use the "classic" solution that you have been modifying. I have had ~ 2 million cases in the temporary file without any problems; if you drop all variables you don't need, the temporary file doesn't get that big in spite of the large number of cases. It is my guess that the solution you have been working on will be slower than the "classic" one, since a) you will be doing *a lot* of file-operations, that tend to take a long time, and b) as John Bauer writes, you will need a macro-loop, which by nature isn't too fast either.

Here is an example that bootstraps the median-age in the GSS93-subset that comes with SPSS. To get the median(s) I use SPLIT FILE and FREQUENCIES:

GET FILE='c:\program files\spss\GSS93 subset.sav'. SELECT IF NOT MISSING(AGE) . VARIABLE LABEL age '' . COMPUTE id=$CASENUM . EXECUTE . SAVE OUTFILE='bootdata.sav' /KEEP id age .

INPUT PROGRAM . LOOP sample=1 TO 1000 . LOOP #n = 1 to 1495 . COMPUTE id=TRUNC(UNIFORM(1495)) + 1 . LEAVE sample . END CASE . END LOOP . END LOOP . END FILE . END INPUT PROGRAM .

SORT CASES BY id . MATCH FILES /FILE=* /TABLE='bootdata.sav' /BY id . EXECUTE .

SORT CASES BY sample . SPLIT FILE BY sample . FREQUENCIES /VARIABLES age /STATISTICS=median /FORMAT=NOTABLE . SPLIT FILE OFF .

My only contribution to all this is that I have written a script that creates a SPSS-dataset from the statistics-table generated by FREQUENCIES. This makes it easy to get confidence-intervals for the calculated statistics; just run FREQUENCIES again on the new dataset, requesting fx. 2.5 and 97.5-percentiles. If you are interested, I will e-mail the script to you .

Apologies for the lengthy answer, I hope it is somewhat helpfull.

Rolf

Bauer, John <bauer@SPSS.COM> skrev i en nyhedsmeddelelse:C02FEF109018D3118C9900A024CDCB647B7BBD@HERMES... > As far as the programming issues: > > If you are trying to do this entirely within syntax, you would have to try > to use a !DO loop in a MACRO. > > A scripting solution would also be possible. (I have reason to believe that > Rolf Kjoeller has been working on similar issues; perhaps he will comment.) > > A footnote on the statistical issues: the case numbers supplied in your > example are not bias-corrected. That is, if you are trying to generate a > 95% confidence interval, you may need to request case values smaller than > 0.025*3000 and larger than 0.975*3000 to get 95% coverage (or you should > report that your CI may be biased). A complete discussion of > bias-correction for bootstrap CI's is obviously impossible here; you may > want to refer to the literature, beginning with Efron and Tibshirani. > > > John Bauer, Ph.D. > SPSS Support Statistician > > -----Original Message----- > From: Nichols, David [mailto:nichols@spss.com] > Sent: Wednesday, July 07, 1999 3:40 PM > To: SPSSX-L@LISTSERV.UGA.EDU > Subject: Re: Looping syntax containing "GET FILE" commands > > > John Bauer informs me that I shot before looking here, that the definition > of percentiles being used is empirical, so it does involve picking actual > cases. I withdraw the comment. Sorry for the wasted bandwidth. > > David Nichols > Principal Support Statistician and > Manager of Statistical Support > SPSS Inc. > > > -----Original Message----- > > From: Nichols, David [mailto:nichols@spss.com] > > Sent: Wednesday, July 07, 1999 2:22 PM > > To: SPSSX-L@LISTSERV.UGA.EDU > > Subject: Re: Looping syntax containing "GET FILE" commands > > > > > > While the programming issue asked about here is outside of my area of > > expertise, there is one comment to make here on the > > statistical aspects of > > the problem: the 90th (or other) percentile isn't necessarily > > going to be at > > a given existing data value, so picking a physical case out > > of the file > > isn't a viable strategy. > > > > David Nichols > > Principal Support Statistician and > > Manager of Statistical Support > > SPSS Inc. > > > > > -----Original Message----- > > > From: Brian W. Weir [mailto:brian_weir@CLASS.OREGONVOS.NET] > > > Sent: Tuesday, July 06, 1999 4:24 PM > > > To: SPSSX-L@LISTSERV.UGA.EDU > > > Subject: Looping syntax containing "GET FILE" commands > > > > > > > > > I am trying to bootstrap a sample (n=3000) 1000 times. I > > > want to determine > > > the confidence interval for the 90th percentile. Because > > > putting all of the > > > samples in one database would require 3 million cases, I am > > > trying to create > > > one sample, select the 90th percentile, and append that case > > > to an output > > > file. The next sample would overwrite the first sample (in > > > the temporary > > > database), thus limiting the number of cases that needs to be > > > dealt with at > > > one time. > > > > > > While I have been able to modify some syntax from this > > > listserv to create this > > > routine, I have not found a way to repeat the routine. Because I am > > > performing tasks other than data transformations (opening > > and closing > > > databases), loop statements do not work. Can anyone advise > > > me as to how to > > > automatically repeat the below syntax 1000 times? > > > > > > > > > > > > > > > > > > > > > *THIS FIRST SECTION CREATES A NEW SAMPLE, WITH REPLACEMENT* > > > > > > GET FILE 'D:\SPSS\DATA.SAV'. > > > EXECUTE. > > > > > > COMPUTE ID=$CASENUM . > > > SAVE OUTFILE 'BOOTDATA.SAV'. > > > INPUT PROGRAM . > > > LOOP V = 1 to 3000. > > > COMPUTE ID=TRUNC(UNIFORM(3000) ) + 1. > > > END CASE. > > > END LOOP. > > > END FILE. > > > END INPUT PROGRAM . > > > SORT CASES BY ID . > > > MATCH FILES / FILE * / TABLE 'BOOTDATA.SAV' / BY ID . > > > EXECUTE. > > > > > > *THIS SECTION SELECTS THE CASES OF INTEREST, IE, THE 5TH > > > PERCENTILE, THE > > > MEDIAN, AND THE 90TH PERCENTILE* > > > *RESP_INT IS MY VARIABLE OF INTEREST* > > > SORT CASES BY RESP_INT. > > > COMPUTE ID2=$CASENUM. > > > EXECUTE. > > > FILTER OFF. > > > USE ALL. > > > SELECT IF(ANY(ID2,150,1500,2700)). > > > EXECUTE . > > > > > > *THIS SECTION SAVES THE SELECTED CASES IN "BOOTTEMP" AND THEN > > > APPENDS THE DATA > > > TO "BOOTEND" > > > SAVE OUTFILE 'D:\SPSS\BOOTTEMP.SAV'. > > > GET FILE 'D:\SPSS\BOOTEND.SAV'. > > > ADD FILES /FILE=* > > > /FILE='D:\SPSS\BOOTTEMP.sav'. > > > SAVE OUTFILE 'D:\SPSS\BOOTEND.SAV'. > > > EXECUTE. > > > > > >


Back to: Top of message | Previous page | Main SPSSX-L page