LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 1998)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 25 Sep 1998 16:16:14 +0300
Reply-To:     Gonzalo Kmaid <gkmaid@INTERNET.COM.UY>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From:         Gonzalo Kmaid <gkmaid@INTERNET.COM.UY>
Subject:      systematic sampling (additional question)
Comments: cc: davidm@spss.com
Content-Type: text/plain; charset="us-ascii"

Hi All,

First, I would like to thank David Matheson at SPSS (and Michael Lacy) for their suggestions about systematic sampling.

I am sure you get a bunch of thanks for your participation in the list/newsgroup (you folks at SPSS as well as SPSS' users around the globe that give their time and thinking to help others), but here you have another one: thanks!!!!

Now the bad news :-)

I have an additional question. Of course that the suggested syntax works like a charm, but I would like to use a different approach.

I would like to perform systematic sampling from a file but, instead of using a fixed interval, I want to sample units proportionally to a cumulative population figure.

For example, lets say that the data file looks something like this:

id track block pop cumpop

1 01 001 80 80 2 01 002 85 165 3 02 001 120 285 4 02 002 30 315 5 02 003 160 475 6 03 001 45 520 7 03 002 140 660 8 03 003 24 684 9 04 001 23 707 10 04 002 15 722

('track' and 'block' are geographic identifiers, 'pop' is population of each block, 'cumpop' is population accumulated across blocks).

Given cumulative population=722, lets say I want a sample size 3 proportional to the cumulative population across blocks. Then the interval would be total population divided desired sample size (722/3=241 -rounded-).

Then I generate a random starting point smaller than 241 (for example 137), and select the next block that includes this point (165). Then I go to the next point: 137+241=378, it would be unit 4 with 475 persons. The third unit would be the one that includes 378+241=619, in the example unit 7 (cum pop equals 660). As a result, my sample would be units 2, 4, and 7 [165 <== >137, 475 <== >378, 660 <== >619]

Using the syntax sent by David for systematic sampling with a fixed interval I can get 'til I have the interval and the starting point, but I can't figurate out how to write some syntax to select the cases that I want to be in the sample. I think it would be a combination of using aggregate (using SUM(pop)) to attach the cumulative pop (722) as a variable in the original data file and some 'select if' stuff. But now the procedure of calculating which interval a case falls into based in its position in the data file is no longer valid. And I can't get my hands on a way to write the condition that a case must accomplish to be selected (the 'select if' stuff). Makes any sense???

I hope my long explanation is clear enough. Any additional hint will be really appreciated!

Gonzalo Kmaid gkmaid@internet.com.uy


Back to: Top of message | Previous page | Main SPSSX-L page