|
Let's see if we're getting there. At 02:54 PM 7/2/2007, Louis wrote:
>I would like to find out how I can generate 100 samples. Each sample
>has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
>variable).
>
>Below is the syntax for 1 sample which generates 2 columns. How can I
>create 200 (100 samples x 2) columns of data?
Now, I think I see what you were going for. Something like this, if you
wanted 3 samples:
X1 FILTER1 X2 FILTER2 X3 FILTER3
1.21 0 -0.35 0 -1.35 1
All the X variables are chosen from the uniform distribution; the
FILTER variables are chosen to give value 1, 100 times, in each column.
First, you know syntax better than you give yourself credit for. What
you have is pretty good, though you do still have that problem of
generating 100 pairs of variables, instead of only 1 pair.
For the random selection, you have a neat implementation of the "k/n"
algorithm - your #s1 is what's normally called k, your #s2 is what's
called n. You can do that within the INPUT PROGRAM, if you like, rather
than in a separate pass.
Second: are you sure you want it to come out this way? The other way is
what Gene and I have been looking at: Three 'columns' (variables),
where the first is the sample number, and the other two correspond to
your variables X and FILTER_$. That's easier to use, for most purposes.
(It's called 'long' organization; what you were thinking of, is called
'wide'.)
Taking Gene's logic as I modified it, adding a row count within each
sample (you'll see why, later) and adding your sampling logic (I've
modified it a little, to use RV.BERNOULLI instead of UNIFORM). Here's
selecting 3 samples of 6 cases each, filtering to select 2 in each
sample. (You can expand to 100 samples of 1,000 easily.)
This is SPSS 15 draft output (WRR:not saved separately), giving output
in 'long' format:
INPUT PROGRAM.
. NUMERIC SAMPLE ROW (F4).
. LEAVE SAMPLE.
+ LOOP SAMPLE=1 TO 3.
. NUMERIC FILTER_$(F2).
. COMPUTE #S1 = 2 /* Number of cases within filter */.
. COMPUTE #S2 = 6 /* Size of each sample */.
+ LOOP #Case = 1 to #S2.
. COMPUTE ROW = #Case.
+ COMPUTE X = RV.NORMAL(0,1).
. compute filter_$ = RV.BERNOULLI(#s1/#s2).
. compute #s1 = #s1 - filter_$.
. compute #s2 = #s2 - 1.
+ END CASE.
+ END LOOP.
+ END LOOP.
+ END FILE.
END INPUT PROGRAM.
DATASET NAME LongForm WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created |03-JUL-2007 14:47:21 |
|-----------------------------|---------------------------|
[LongForm]
SAMPLE ROW FILTER_$ X
1 1 1 -1.37
1 2 0 -.74
1 3 0 .90
1 4 1 1.13
1 5 0 .36
1 6 0 -.89
2 1 0 .69
2 2 0 .97
2 3 0 -.22
2 4 1 1.93
2 5 0 -.84
2 6 1 -.04
3 1 1 .81
3 2 0 1.95
3 3 0 -.83
3 4 0 -1.48
3 5 1 .60
3 6 0 1.79
Number of cases read: 18 Number of cases listed: 18
Now, to get 'wide' format, you could use VECTOR and LOOP, as you were
planning to. But I think it's easier to generate 'long' form and
convert to 'wide' with CASESTOVARS. (The DATASET commands are not
necessary; they made testing easier):
DATASET ACTIVATE LongForm WINDOW=FRONT.
DATASET COPY WideForm.
DATASET ACTIVATE WideForm WINDOW=FRONT.
SORT CASES BY ROW SAMPLE .
CASESTOVARS
/ID = ROW
/INDEX = SAMPLE
/GROUPBY = INDEX .
Cases to Variables
|----------------------------|---------------------------|
|Output Created |03-JUL-2007 14:48:52 |
|----------------------------|---------------------------|
[WideForm]
Generated Variables
|--------|------|----------|
|Original|SAMPLE|Result |
|Variable| |----------|
| | |Name |
|--------|------|----------|
|FILTER_$|1 |FILTER_$.1|
| |2 |FILTER_$.2|
| |3 |FILTER_$.3|
|--------|------|----------|
|X |1 |X.1 |
| |2 |X.2 |
| |3 |X.3 |
|--------|------|----------|
Processing Statistics
|---------------|---|
|Cases In |18 |
|Cases Out |6 |
|---------------|---|
|Cases In/Cases |3.0|
|Out | |
|---------------|---|
|Variables In |4 |
|Variables Out |7 |
|---------------|---|
|Index Values |3 |
|---------------|---|
LIST.
List
|-----------------------------|---------------------------|
|Output Created |03-JUL-2007 14:48:52 |
|-----------------------------|---------------------------|
[WideForm]
ROW FILTER_$.1 X.1 FILTER_$.2 X.2 FILTER_$.3 X.3
1 1 -1.37 0 .69 1 .81
2 0 -.74 0 .97 0 1.95
3 0 .90 0 -.22 0 -.83
4 1 1.13 1 1.93 0 -1.48
5 0 .36 0 -.84 1 .60
6 0 -.89 1 -.04 0 1.79
Number of cases read: 6 Number of cases listed: 6
|