LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 13 Jul 2005 02:49:39 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: a sampling question
In-Reply-To:  <MC7-F388I73RTNO0txx001c6cd2@mc7-f38.hotmail.com>
Content-Type: text/plain; format=flowed

wrristow@MINDSPRING.COM replied: > >Trying to treat the k disjoint pieces as individual samples will be > >problematic, since combining them will present headaches. Trying to > >treat them as a single, complex sample will do wonders for your > >aspirin bills. :-) > >This may be something I need to be walked through slowly and simply. > >The only time I did the "k disjoint samples" thing was to generate a >sample for a door-to-door survey in a small city. We had a list of >(putatively) all the addresses in the city. They wished to sample to >some stopping criterion I no longer recall; possibly, until they had a >certain number of completed interviews. They wanted the sample set that >they used until stopping to be a simple random sample. > >The solution (I was functioning more as programmer than as >statistician) was to generate k disjoint random samples, which were >called 'batches'. Batches were used in order until the stopping >criterion was reached, with the rule that if any addresses in a batch >were used, the whole batch had to be used. > >One glaring problem is that the set of sampled addresses for which >interviews were completed is guaranteed to be biased. (To be fair, they >tried diligently to follow up at sampled addresses. Most of the losses >were addresses that were ineligible by predetermined criteria: no >occupancy, business rather than residential, failure to meet certain >household criteria.) > >The investigators took for granted (and I saw no reason to doubt) that >for any n, the union of the first n 'batches' was a proper simple >random sample of the city addresses. It didn't do wonders for anybody's >aspirin bills; nobody worried. > >But it sounds like we should have worried a lot. I'm afraid I'm not >seeing it. Can you say what I'm missing?

I don't think you needed to worry that much. What you're talking about here is yet another wrinkle. You have in essence built replicates which are 'backup' samples. Each 'batch' is another attempt to cover the entire underlying structure of the sampled population. So the union of the batches would still reflect the target population.

Still, the issue you brought up means that there are two problems that one hits with this type of replacement process. One is that you run into biases due to unknown reasons for the dropouts. A more complex sampling design might at least let you look at which components of the population are getting undersampled. Problem two is that this loss of sample points means that your population for reporting is just a subset of your target population (you're not sampling the class of people who refuse to be sampled), and you need to adjust your weights to compensate accordingly.. if possible. In essence, the sum of the weights should reflect the size of the reportable segments of your target population.

So there probably is less to worry about than you thought, but perhaps more than your investigators considered. :-)

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Back to: Top of message | Previous page | Main SAS-L page