Date: Fri, 3 Jun 2005 22:56:21 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Proc surveylogistic
Content-type: text/html; charset=ISO-8859-1
<FONT face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif" size=2><DIV><A href="mailto:ben.powell@CLA.CO.UK" target=blank>ben.powell@CLA.CO.UK</A> replied to my rambling:</DIV><DIV>>>The larger the variation within clusters, the more efficient your cluster <BR>>>design becomes. The smaller the variation within cluster, the more likely <BR>>>you are to have made an error by using cluster sampling in the first <BR>>>place. <BR>> <BR>>This may be a bit of a clustering layman's question but the logistical <BR>>question in particular interests me: is there any way you could tell in <BR>>advance whether your population was suited to cluster based sampling? If the <BR>>variation in a given metric is equal in a dense cluster to that within a <BR>>widely distributed cluster then is it reasonable to assume clustering would <BR>>be an effective selection methodology? I have historical data I could <BR>>analysis to test for this, <BR></DIV><DIV>Let me put a different spin on this.</DIV><DIV> </DIV><DIV>In general, you only want to perform cluster sampling when you have to.</DIV><DIV>You can lose a lot of power over samples of the same size, taken in more</DIV><DIV>effective ways, if you use cluster sampling. But when you need cluster</DIV><DIV>samplnig, there are few substitutes.</DIV><DIV> </DIV><DIV>If you have the entire target population in a database, then you don't need</DIV><DIV>to perform cluster sampling. If you're missing big chunks of the population,</DIV><DIV>or your sampling frame (your list of potential sample points that makes up your</DIV><DIV>data set) is incomplete, then you may need to resort to cluster sampling.</DIV><DIV>You choose cluster sampling for logistical reasons, not statistical reasons.</DIV><DIV>Those logistical reasons may also have to do with things like budgets, the</DIV><DIV>number of available people for fieldwork, restrictions on training or travel, etc.</DIV><DIV> </DIV><DIV>Now Phil's case is an excellent example of cluster sampling in action. The </DIV><DIV>field team needed to sample a target population for which they simply had no</DIV><DIV>sample frame. They couldn't go down a big list of names and addresses, and</DIV><DIV>pick out n people.. because that list doesn't exist. The best they could do</DIV><DIV>was to divide the country into geographic regions small enough that they could</DIV><DIV>perform some manner of sampling within a region. (Whether the sampling</DIV><DIV>within a region was really probabilistic is another hard problem.) So they</DIV><DIV>ended up with a two-stage sample. I imagine that their sample does not have</DIV><DIV>the power that a perfect sample would have, but we already know that the</DIV><DIV>perfect sample can't be done given the available data and resources.<BR></DIV><DIV>So you just don't look to see whether your population is suitable for cluster</DIV><DIV>sampling. You do cluster sampling if you have to, regardless of whether the</DIV><DIV>statisticians are going to be happy. You worry about the distributional behavior</DIV><DIV>within and across cluster after the fact.</DIV><DIV> </DIV><DIV>Does this make sense?</DIV><DIV>Did I give an adequate answer?<BR></DIV><DIV>David<BR>-- <BR>David Cassell, CSC<BR><A href="mailto:Cassell.David@epa.gov" target=blank >Cassell.David@epa.gov</A><BR>Senior computing specialist<BR>mathematical statistician</DIV></FONT> |