Date: Wed, 9 Apr 2003 14:25:05 -0400
Reply-To: sashole@bellsouth.net
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject: Re: Sampling Question
In-Reply-To: <OF46C1FD04.C1102C55-ON85256D03.005F2ECB@kendle.com>
Content-Type: text/plain; charset="us-ascii"
> -----Original Message-----
> From: diskin.dennis@kendle.com [mailto:diskin.dennis@kendle.com]
>
> Paul,
>
> I'm afraid your algorithim is flawed. It allows duplicates to
> be selected.
Dennis,
The algorithm is perfect; the implementation is plagued with
absent-mindedness: I forgot to surround _n_ with parentheses in
ptr (_p_) = _n_ ;
which should be
ptr (_p_) = ptr (_n_) ;
But wait: Did not I just write the same in response to Dale? I guess I
did. Here goes the memory, again... :-(.
Kind regards,
===================
Paul M. Dorfman
Jacksonville, FL
===================
> From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>@LISTSERV.UGA.EDU> on
> 04/09/2003 12:36 PM
>
>
> >From: Action Man <wollo_desse@HOTMAIL.COM>
> >
> >I have 7,000 records in my SAS file. Out of these records I want to
> >pick 500 of them randomly, >How do I do that using SAS.
>
> Wollo,
>
> You will no doubt get (or have already gotten) plenty of
> advice how to do it using the "standard" K/N method, where K
> is the sample size and N is the population size. It is based
> on reading all N records from the population file. It is
> plenty sufficient and fast in your case, where N=7000 only
> and K=500 is not a tiny fraction of N. Below is an
> alternative approach allowing to obtain the sample by reading
> K records only, which may be preferable for large Ns (easily
> up to, say, 5E+6 with modern memory sizes) and/or K/N <<
> 1:
>
> %let n_pop = 7000 ;
> %let n_smpl = 500 ;
>
> data pop ;
> array vars (*) a b v03-v11 ;
> do a = 1 to &n_pop ;
> do b = 3 to 11 ;
> vars (b) = ceil (ranuni(1) * 1e11) ;
> end ;
> output ;
> end ;
> run ;
>
> data sample (drop = _:) ;
> array ptr (&n_pop) _temporary_ ;
>
> do _p_ = 1 to hbound (ptr) ;
> ptr (_p_) = _p_ ;
> end ;
>
> do _n_ = &n_pop to &n_pop - &n_smpl + 1 by -1 ;
> _p_ = ceil (ranuni(1) * _n_) ;
> point = ptr (_p_) ;
> set pop point = point ;
> output ;
> ptr (_p_) = _n_ ;
> end ;
>
> stop ;
> run ;
|