```Date: Fri, 25 Sep 1998 16:16:14 +0300 Reply-To: Gonzalo Kmaid Sender: "SPSSX(r) Discussion" From: Gonzalo Kmaid Subject: systematic sampling (additional question) Comments: cc: davidm@spss.com Content-Type: text/plain; charset="us-ascii" Hi All, First, I would like to thank David Matheson at SPSS (and Michael Lacy) for their suggestions about systematic sampling. I am sure you get a bunch of thanks for your participation in the list/newsgroup (you folks at SPSS as well as SPSS' users around the globe that give their time and thinking to help others), but here you have another one: thanks!!!! Now the bad news :-) I have an additional question. Of course that the suggested syntax works like a charm, but I would like to use a different approach. I would like to perform systematic sampling from a file but, instead of using a fixed interval, I want to sample units proportionally to a cumulative population figure. For example, lets say that the data file looks something like this: id track block pop cumpop 1 01 001 80 80 2 01 002 85 165 3 02 001 120 285 4 02 002 30 315 5 02 003 160 475 6 03 001 45 520 7 03 002 140 660 8 03 003 24 684 9 04 001 23 707 10 04 002 15 722 ('track' and 'block' are geographic identifiers, 'pop' is population of each block, 'cumpop' is population accumulated across blocks). Given cumulative population=722, lets say I want a sample size 3 proportional to the cumulative population across blocks. Then the interval would be total population divided desired sample size (722/3=241 -rounded-). Then I generate a random starting point smaller than 241 (for example 137), and select the next block that includes this point (165). Then I go to the next point: 137+241=378, it would be unit 4 with 475 persons. The third unit would be the one that includes 378+241=619, in the example unit 7 (cum pop equals 660). As a result, my sample would be units 2, 4, and 7 [165 <== >137, 475 <== >378, 660 <== >619] Using the syntax sent by David for systematic sampling with a fixed interval I can get 'til I have the interval and the starting point, but I can't figurate out how to write some syntax to select the cases that I want to be in the sample. I think it would be a combination of using aggregate (using SUM(pop)) to attach the cumulative pop (722) as a variable in the original data file and some 'select if' stuff. But now the procedure of calculating which interval a case falls into based in its position in the data file is no longer valid. And I can't get my hands on a way to write the condition that a case must accomplish to be selected (the 'select if' stuff). Makes any sense??? I hope my long explanation is clear enough. Any additional hint will be really appreciated! Gonzalo Kmaid gkmaid@internet.com.uy ```

