Date: Tue, 3 Jul 2007 14:55:05 -0400 Richard Ristow "SPSSX(r) Discussion" Richard Ristow Re: How to generate 1000 samples? To: Louis cc: Gene Maguin <200707021854.l62GkhOP031710@mailgw.cc.uga.edu> text/plain; charset="us-ascii"; format=flowed

Let's see if we're getting there. At 02:54 PM 7/2/2007, Louis wrote:

>I would like to find out how I can generate 100 samples. Each sample >has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter >variable). > >Below is the syntax for 1 sample which generates 2 columns. How can I >create 200 (100 samples x 2) columns of data?

Now, I think I see what you were going for. Something like this, if you wanted 3 samples:

X1 FILTER1 X2 FILTER2 X3 FILTER3 1.21 0 -0.35 0 -1.35 1

All the X variables are chosen from the uniform distribution; the FILTER variables are chosen to give value 1, 100 times, in each column.

First, you know syntax better than you give yourself credit for. What you have is pretty good, though you do still have that problem of generating 100 pairs of variables, instead of only 1 pair.

For the random selection, you have a neat implementation of the "k/n" algorithm - your #s1 is what's normally called k, your #s2 is what's called n. You can do that within the INPUT PROGRAM, if you like, rather than in a separate pass.

Second: are you sure you want it to come out this way? The other way is what Gene and I have been looking at: Three 'columns' (variables), where the first is the sample number, and the other two correspond to your variables X and FILTER_\$. That's easier to use, for most purposes. (It's called 'long' organization; what you were thinking of, is called 'wide'.)

Taking Gene's logic as I modified it, adding a row count within each sample (you'll see why, later) and adding your sampling logic (I've modified it a little, to use RV.BERNOULLI instead of UNIFORM). Here's selecting 3 samples of 6 cases each, filtering to select 2 in each sample. (You can expand to 100 samples of 1,000 easily.)

This is SPSS 15 draft output (WRR:not saved separately), giving output in 'long' format:

INPUT PROGRAM. . NUMERIC SAMPLE ROW (F4). . LEAVE SAMPLE. + LOOP SAMPLE=1 TO 3. . NUMERIC FILTER_\$(F2). . COMPUTE #S1 = 2 /* Number of cases within filter */. . COMPUTE #S2 = 6 /* Size of each sample */. + LOOP #Case = 1 to #S2. . COMPUTE ROW = #Case. + COMPUTE X = RV.NORMAL(0,1). . compute filter_\$ = RV.BERNOULLI(#s1/#s2). . compute #s1 = #s1 - filter_\$. . compute #s2 = #s2 - 1. + END CASE. + END LOOP. + END LOOP. + END FILE. END INPUT PROGRAM. DATASET NAME LongForm WINDOW=FRONT. LIST.

List |-----------------------------|---------------------------| |Output Created |03-JUL-2007 14:47:21 | |-----------------------------|---------------------------| [LongForm]

SAMPLE ROW FILTER_\$ X

1 1 1 -1.37 1 2 0 -.74 1 3 0 .90 1 4 1 1.13 1 5 0 .36 1 6 0 -.89 2 1 0 .69 2 2 0 .97 2 3 0 -.22 2 4 1 1.93 2 5 0 -.84 2 6 1 -.04 3 1 1 .81 3 2 0 1.95 3 3 0 -.83 3 4 0 -1.48 3 5 1 .60 3 6 0 1.79

Number of cases read: 18 Number of cases listed: 18

Now, to get 'wide' format, you could use VECTOR and LOOP, as you were planning to. But I think it's easier to generate 'long' form and convert to 'wide' with CASESTOVARS. (The DATASET commands are not necessary; they made testing easier):

DATASET ACTIVATE LongForm WINDOW=FRONT. DATASET COPY WideForm. DATASET ACTIVATE WideForm WINDOW=FRONT.

SORT CASES BY ROW SAMPLE . CASESTOVARS /ID = ROW /INDEX = SAMPLE /GROUPBY = INDEX .

Cases to Variables |----------------------------|---------------------------| |Output Created |03-JUL-2007 14:48:52 | |----------------------------|---------------------------| [WideForm]

Generated Variables |--------|------|----------| |Original|SAMPLE|Result | |Variable| |----------| | | |Name | |--------|------|----------| |FILTER_\$|1 |FILTER_\$.1| | |2 |FILTER_\$.2| | |3 |FILTER_\$.3| |--------|------|----------| |X |1 |X.1 | | |2 |X.2 | | |3 |X.3 | |--------|------|----------|

Processing Statistics |---------------|---| |Cases In |18 | |Cases Out |6 | |---------------|---| |Cases In/Cases |3.0| |Out | | |---------------|---| |Variables In |4 | |Variables Out |7 | |---------------|---| |Index Values |3 | |---------------|---|

LIST.

List |-----------------------------|---------------------------| |Output Created |03-JUL-2007 14:48:52 | |-----------------------------|---------------------------| [WideForm]

ROW FILTER_\$.1 X.1 FILTER_\$.2 X.2 FILTER_\$.3 X.3

1 1 -1.37 0 .69 1 .81 2 0 -.74 0 .97 0 1.95 3 0 .90 0 -.22 0 -.83 4 1 1.13 1 1.93 0 -1.48 5 0 .36 0 -.84 1 .60 6 0 -.89 1 -.04 0 1.79

Number of cases read: 6 Number of cases listed: 6

Back to: Top of message | Previous page | Main SPSSX-L page