LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (April 2003, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 11 Apr 2003 16:52:40 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Proc Surveyselect and Minimal Cell Sizes - Was RE: Proc Sort
              Rand om
Content-type: text/plain; charset=us-ascii

"Gerstle, John" <yzg9@CDC.GOV> replied [in part]: > David (not Dale right?),

Right. :-)

> It does seem to be sample crazy on the list the past few days.

And it's probably not due to my speech at SUGI either. :-)

> I read in the documentation on Proc Surveyselect about the CONTROL statement > etc. Intuitively, one would think that if it's a random sampling procedure, > one would not need to sort the dataset - test each record for it's inclusion > in one of the levels of the strata and place in the resulting dataset, and > continue until the sample size requirements are met. And I repeat, > intuitively. Of well, not a big deal. But wait, what I mention above, is > this what the CONTROL and SIZE statements do? I need to read more on this.

But the CONTROL statement is not designed for doing SRS sampling. It is designed so that you can do sequential or systematic sampling (in the manner of Chromy). For these, you need a defined order. PROC SURVEYSELECT even lets you do 'serpentine' ordering on multiple variables.. and lets you output the resulting re-ordered frame using the OUTSORT= option.

> proc sort data=matchSN_BS2; by quadrant quad4; run; > proc surveyselect data=matchSN_BS2 out=matchedSN_BS > n = (200 200 200 50 50 50 10 10 10 10 10 10) seed=123 ; > strata quadrant quad4; > id _all_; > run; > > This did not work because the dataset that I'm using to test the code does > NOT have these cell numbers, i.e. there are only 10 cases where quadrant=1, > even though I need to sample towards 200. So I need to figure out a way to > run the sampling where it takes into account the minimum value between the > actual n and the maximum n (200). > > I was reading about SAMPSIZE=SAS-dataset but I'm not sure how that dataset > should be organized. There aren't examples showing this. I could use ODS > Output on a Proc Freq, get the actual counts for the data, transpose the > dataset into one row. Would that be what is required here?

Actually, you want to organize that auxiliary data set as if you were about to merge the frame with the auxiliary. So put in three variables: QUADRANT, QUAD4, and _NSIZE_ (use that exact variable name). Put in a record for each value of quadrant and quad4, with the desired/feasible sample size for that stratum. Do that PROC FREQ, so you know to put in values of sample size which are achievable within each stratum, even if you are selecting *every* record in the stratum (that's legal, SAS will compute the right sample weight for you). When you have a requirement like SRS sampling and limitations like frame sizes that cannot meet your specs, SAS will not know what to do. You need to put in workable sample sizes by hand here.

BTW, you can specify a seed basically as any integer between 1 and 2**31-1 so you don't need to stick with '123'. I know you knew that, but I thought I'd say so for the benefit of the home audience. :-)

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page