LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2000, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 4 Mar 2000 02:23:38 +0000
Reply-To:     John Whittington <John.W@MEDISCIENCE.CO.UK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         John Whittington <John.W@MEDISCIENCE.CO.UK>
Subject:      Re: Ordered Random Extract
Comments: To: sashole@mediaone.net
Comments: cc: dnordlund@AOL.COM
In-Reply-To:  <200003032047.UAA14949@vicar.netnames.net>
Content-Type: text/plain; charset="us-ascii"

At 20:46 03/03/00 GMT, Paul Dorfman wrote:

>Try this: > >19 %let n = 1e10; >20 data _null_; >21 array r (500000) _temporary_; >22 s = &n / dim(r); >23 do i=1 to dim(r); >24 r(i) = (i-1)*s + ceil(ranuni(1)*s); >25 end; >26 run; >NOTE: The DATA statement used 0.02 CPU seconds and 7062K. > >The idea here is to split the entire [1:1e10] range into 5e5 equal pieces >and make the next selection from the next consecutive interval containing >&n/dim(r) numbers. By the nature of the algorithm, the selected items will >be unique and ascending. Intuitively, it should provide for a strictly equal >chance for any [1:1e10] number to be selected if &n is a multiple of dim(r). >If not, there should be some skewness pertaining to the choice from the last >interval, but I would not expect it to be sizable. Someone please correct me >if my hunch is out of whack (Dr. John? David?).

Paul, Dan seems to have beaten me to it, and said most of what I would have said. As he points out, your code will produce a stratified random sample. This might well be OK for the John's purpose, but it will, 'on average', be a much more 'evenly spread' sample than would (again 'on average') be the case with a simple random sample. One of the illusions that many Lottery players are under is that a random sample ought to be fairly 'uniform' in its spread!

Dan goes on to write:

>My initial thought was to use a without-replacement sampling scheme. Something >like: > >%let total=1e10; >%let wanted=500000; > >n=0; >needed=&wanted; >remain=total; >do i=1 to &total until(needed eq 0); > if ranuni(-1) le needed/remain > then do; > n+1; > r[n]=i; > needed+(-1); > end; > remain+(-1); >end; > >This will do the job of a simple random sample, but it not very efficient. It >would have taken over 10 hours on my 133Mhz pc. Maybe someone else can produce >a more efficient simple random sample. On the other hand if time isn't a >problem, this works.

Apart from a missing amersand (remain=&total) I agree - but, just like Dan, I am using a 133 Mhz PC, and so would find this more than a little tedious!

As always in these situations, it's the 'without replacement' which is the pain. For 'with replacement' one could simply use something like:

data _null_; array r (500000) _temporary_; do i=1 to dim(r); r(i) = ceil(ranuni(whatever)*1e10) ; end; <Paul's favourite array sorting routine here!> run;

Ironically, since we are sampling 500,000 from 1e10, the chance of any value being repeated is pretty tiny - so, in this particular case, the 'with replacement' solution stands a very high chance of effectively being a 'without replacement' one. However, that obviously is not a certainty.

If "Paul's favourite array sorting routine" can be easily adapted to remove duplicates, then one might consider something like (untested!):

data _null_ ; array s (600000) _temporary_ ; array r (500000) _temporary_ ; do i=1 to dim(s); s(i) = ceil(ranuni(whatever)*1e10) ; end; <Paul's favourite array sorting + duplicate removing routine here!> do i = 1 to dim(r) ; do until r(i) ne . ; r(i) = s(ranuni(whatever)*dim(s)) ; end ; run;

Of course, if one had adequate resources to set up a 1e10 temporary array, the one might consider something like (again untested):

data _null_; array r (500000) _temporary_; array s (1e10) _temporary_ ; do i=1 to dim(r); OK = 0 ; do until OK = 1 ; r(i) = ceil(ranuni(whatever)*1e10) ; if s(r(i)) = . then do ; OK = 1 ; s(r(i)) = 1 ; end ; end ; end; <Paul's favourite array sorting routine here!> run;

I think that exhausts my immediate thoughts!

Kind Regards,

John

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk Buckingham MK18 4EL, UK mediscience@compuserve.com ----------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page