Date: Fri, 11 Apr 2003 16:52:40 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Proc Surveyselect and Minimal Cell Sizes - Was RE: Proc Sort
Content-type: text/plain; charset=us-ascii
"Gerstle, John" <yzg9@CDC.GOV> replied [in part]:
> David (not Dale right?),
> It does seem to be sample crazy on the list the past few days.
And it's probably not due to my speech at SUGI either. :-)
> I read in the documentation on Proc Surveyselect about the CONTROL
> etc. Intuitively, one would think that if it's a random sampling
> one would not need to sort the dataset - test each record for it's
> in one of the levels of the strata and place in the resulting dataset,
> continue until the sample size requirements are met. And I repeat,
> intuitively. Of well, not a big deal. But wait, what I mention
> this what the CONTROL and SIZE statements do? I need to read more on
But the CONTROL statement is not designed for doing SRS sampling.
It is designed so that you can do sequential or systematic sampling
(in the manner of Chromy). For these, you need a defined order.
PROC SURVEYSELECT even lets you do 'serpentine' ordering on multiple
variables.. and lets you output the resulting re-ordered frame using
the OUTSORT= option.
> proc sort data=matchSN_BS2; by quadrant quad4; run;
> proc surveyselect data=matchSN_BS2 out=matchedSN_BS
> n = (200 200 200 50 50 50 10 10 10 10 10 10) seed=123 ;
> strata quadrant quad4;
> id _all_;
> This did not work because the dataset that I'm using to test the code
> NOT have these cell numbers, i.e. there are only 10 cases where
> even though I need to sample towards 200. So I need to figure out a
> run the sampling where it takes into account the minimum value between
> actual n and the maximum n (200).
> I was reading about SAMPSIZE=SAS-dataset but I'm not sure how that
> should be organized. There aren't examples showing this. I could use
> Output on a Proc Freq, get the actual counts for the data, transpose
> dataset into one row. Would that be what is required here?
Actually, you want to organize that auxiliary data set as if you were
about to merge the frame with the auxiliary. So put in three variables:
QUADRANT, QUAD4, and _NSIZE_ (use that exact variable name). Put in a
record for each value of quadrant and quad4, with the desired/feasible
sample size for that stratum. Do that PROC FREQ, so you know to put in
values of sample size which are achievable within each stratum, even if
you are selecting *every* record in the stratum (that's legal, SAS will
compute the right sample weight for you). When you have a requirement
like SRS sampling and limitations like frame sizes that cannot meet your
specs, SAS will not know what to do. You need to put in workable sample
sizes by hand here.
BTW, you can specify a seed basically as any integer between 1 and
so you don't need to stick with '123'. I know you knew that, but I
thought I'd say so for the benefit of the home audience. :-)
David Cassell, CSC
Senior computing specialist