LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 3 Jun 2005 22:56:21 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Proc surveylogistic
Content-type: text/html; charset=ISO-8859-1

<FONT face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif" size=2><DIV><A href="mailto:ben.powell@CLA.CO.UK" target=blank>ben.powell@CLA.CO.UK</A>&nbsp;replied to my rambling:</DIV><DIV>&gt;&gt;The larger the variation within clusters, the more efficient your cluster <BR>&gt;&gt;design becomes. The smaller the variation within cluster, the more likely <BR>&gt;&gt;you are to have made an error by using cluster sampling in the first <BR>&gt;&gt;place. <BR>&gt; <BR>&gt;This may be a bit of a clustering layman's question but the logistical <BR>&gt;question in particular interests me: is there any way you could tell in <BR>&gt;advance whether your population was suited to cluster based sampling? If the <BR>&gt;variation in a given metric is equal in a dense cluster to that within a <BR>&gt;widely distributed cluster then is it reasonable to assume clustering would <BR>&gt;be an effective selection methodology? I have historical data I could <BR>&gt;analysis to test for this, <BR></DIV><DIV>Let me put a different spin on this.</DIV><DIV>&nbsp;</DIV><DIV>In general, you only want to perform cluster sampling when you have to.</DIV><DIV>You can lose a lot of power over samples of the same size, taken in more</DIV><DIV>effective ways, if you use cluster sampling.&nbsp; But when you need cluster</DIV><DIV>samplnig, there are few&nbsp;substitutes.</DIV><DIV>&nbsp;</DIV><DIV>If you have the entire target population in a database, then you don't need</DIV><DIV>to perform cluster sampling.&nbsp; If you're missing big chunks of the population,</DIV><DIV>or your sampling frame (your list of potential sample points that makes up your</DIV><DIV>data set) is incomplete, then you may need to resort to cluster sampling.</DIV><DIV>You choose cluster sampling for logistical reasons, not statistical reasons.</DIV><DIV>Those logistical reasons may also have to do with things like budgets, the</DIV><DIV>number of available people for fieldwork, restrictions on training or travel, etc.</DIV><DIV>&nbsp;</DIV><DIV>Now Phil's case is an excellent example of cluster sampling in action.&nbsp; The </DIV><DIV>field team needed to sample a target population for which they simply had no</DIV><DIV>sample frame.&nbsp; They couldn't go down a big list of names and addresses, and</DIV><DIV>pick out n people.. because that list doesn't exist.&nbsp; The best they could do</DIV><DIV>was to divide the country into geographic regions small enough that they could</DIV><DIV>perform some manner of sampling within a region.&nbsp; (Whether the sampling</DIV><DIV>within a region was really probabilistic is another hard problem.)&nbsp; So they</DIV><DIV>ended up with a two-stage sample.&nbsp; I imagine that their sample does not have</DIV><DIV>the power that a perfect sample would have, but we already know that the</DIV><DIV>perfect sample can't be done given the available data and resources.<BR></DIV><DIV>So you just&nbsp;don't look to see whether your population is suitable for cluster</DIV><DIV>sampling.&nbsp; You do cluster sampling if you have to, regardless of whether the</DIV><DIV>statisticians are going to be happy.&nbsp; You worry about the distributional behavior</DIV><DIV>within and across cluster after the fact.</DIV><DIV>&nbsp;</DIV><DIV>Does this make sense?</DIV><DIV>Did I give an adequate answer?<BR></DIV><DIV>David<BR>-- <BR>David Cassell, CSC<BR><A href="mailto:Cassell.David@epa.gov" target=blank >Cassell.David@epa.gov</A><BR>Senior computing specialist<BR>mathematical statistician</DIV></FONT>


Back to: Top of message | Previous page | Main SAS-L page