Date: Fri, 25 Sep 1998 16:16:14 +0300
Reply-To: Gonzalo Kmaid <gkmaid@INTERNET.COM.UY>
Sender: "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From: Gonzalo Kmaid <gkmaid@INTERNET.COM.UY>
Subject: systematic sampling (additional question)
Content-Type: text/plain; charset="us-ascii"
Hi All,
First, I would like to thank David Matheson at SPSS (and Michael Lacy) for
their suggestions about systematic sampling.
I am sure you get a bunch of thanks for your participation in the
list/newsgroup (you folks at SPSS as well as SPSS' users around the globe
that give their time and thinking to help others), but here you have
another one: thanks!!!!
Now the bad news :-)
I have an additional question. Of course that the suggested syntax works
like a charm, but I would like to use a different approach.
I would like to perform systematic sampling from a file but, instead of
using a fixed interval, I want to sample units proportionally to a
cumulative population figure.
For example, lets say that the data file looks something like this:
id track block pop cumpop
1 01 001 80 80
2 01 002 85 165
3 02 001 120 285
4 02 002 30 315
5 02 003 160 475
6 03 001 45 520
7 03 002 140 660
8 03 003 24 684
9 04 001 23 707
10 04 002 15 722
('track' and 'block' are geographic identifiers, 'pop' is population of
each block, 'cumpop' is population accumulated across blocks).
Given cumulative population=722, lets say I want a sample size 3
proportional to the cumulative population across blocks. Then the interval
would be total population divided desired sample size (722/3=241 -rounded-).
Then I generate a random starting point smaller than 241 (for example 137),
and select the next block that includes this point (165). Then I go to the
next point: 137+241=378, it would be unit 4 with 475 persons. The third
unit would be the one that includes 378+241=619, in the example unit 7 (cum
pop equals 660). As a result, my sample would be units 2, 4, and 7 [165 <==
>137, 475 <== >378, 660 <== >619]
Using the syntax sent by David for systematic sampling with a fixed
interval I can get 'til I have the interval and the starting point, but I
can't figurate out how to write some syntax to select the cases that I want
to be in the sample. I think it would be a combination of using aggregate
(using SUM(pop)) to attach the cumulative pop (722) as a variable in the
original data file and some 'select if' stuff. But now the procedure of
calculating which interval a case falls into based in its position in the
data file is no longer valid. And I can't get my hands on a way to write
the condition that a case must accomplish to be selected (the 'select if'
stuff). Makes any sense???
I hope my long explanation is clear enough. Any additional hint will be
really appreciated!
Gonzalo Kmaid
gkmaid@internet.com.uy