Date: Wed, 13 Jul 2005 02:49:39 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: a sampling question
In-Reply-To: <MC7-F388I73RTNO0txx001c6cd2@mc7-f38.hotmail.com>
Content-Type: text/plain; format=flowed
wrristow@MINDSPRING.COM replied:
> >Trying to treat the k disjoint pieces as individual samples will be
> >problematic, since combining them will present headaches. Trying to
> >treat them as a single, complex sample will do wonders for your
> >aspirin bills. :-)
>
>This may be something I need to be walked through slowly and simply.
>
>The only time I did the "k disjoint samples" thing was to generate a
>sample for a door-to-door survey in a small city. We had a list of
>(putatively) all the addresses in the city. They wished to sample to
>some stopping criterion I no longer recall; possibly, until they had a
>certain number of completed interviews. They wanted the sample set that
>they used until stopping to be a simple random sample.
>
>The solution (I was functioning more as programmer than as
>statistician) was to generate k disjoint random samples, which were
>called 'batches'. Batches were used in order until the stopping
>criterion was reached, with the rule that if any addresses in a batch
>were used, the whole batch had to be used.
>
>One glaring problem is that the set of sampled addresses for which
>interviews were completed is guaranteed to be biased. (To be fair, they
>tried diligently to follow up at sampled addresses. Most of the losses
>were addresses that were ineligible by predetermined criteria: no
>occupancy, business rather than residential, failure to meet certain
>household criteria.)
>
>The investigators took for granted (and I saw no reason to doubt) that
>for any n, the union of the first n 'batches' was a proper simple
>random sample of the city addresses. It didn't do wonders for anybody's
>aspirin bills; nobody worried.
>
>But it sounds like we should have worried a lot. I'm afraid I'm not
>seeing it. Can you say what I'm missing?
I don't think you needed to worry that much. What you're talking about
here is yet another wrinkle. You have in essence built replicates which
are 'backup' samples. Each 'batch' is another attempt to cover the entire
underlying structure of the sampled population. So the union of the batches
would still reflect the target population.
Still, the issue you brought up means that there are two problems that one
hits with this type of replacement process. One is that you run into biases
due to unknown reasons for the dropouts. A more complex sampling design
might at least let you look at which components of the population are
getting
undersampled. Problem two is that this loss of sample points means that
your
population for reporting is just a subset of your target population (you're
not
sampling the class of people who refuse to be sampled), and you need to
adjust your weights to compensate accordingly.. if possible. In essence,
the
sum of the weights should reflect the size of the reportable segments of
your target population.
So there probably is less to worry about than you thought, but perhaps more
than your investigators considered. :-)
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
|