Date: Sat, 4 Mar 2000 02:23:38 +0000
Reply-To: John Whittington <John.W@MEDISCIENCE.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: John Whittington <John.W@MEDISCIENCE.CO.UK>
Subject: Re: Ordered Random Extract
In-Reply-To: <200003032047.UAA14949@vicar.netnames.net>
Content-Type: text/plain; charset="us-ascii"
At 20:46 03/03/00 GMT, Paul Dorfman wrote:
>Try this:
>
>19 %let n = 1e10;
>20 data _null_;
>21 array r (500000) _temporary_;
>22 s = &n / dim(r);
>23 do i=1 to dim(r);
>24 r(i) = (i-1)*s + ceil(ranuni(1)*s);
>25 end;
>26 run;
>NOTE: The DATA statement used 0.02 CPU seconds and 7062K.
>
>The idea here is to split the entire [1:1e10] range into 5e5 equal pieces
>and make the next selection from the next consecutive interval containing
>&n/dim(r) numbers. By the nature of the algorithm, the selected items will
>be unique and ascending. Intuitively, it should provide for a strictly equal
>chance for any [1:1e10] number to be selected if &n is a multiple of dim(r).
>If not, there should be some skewness pertaining to the choice from the last
>interval, but I would not expect it to be sizable. Someone please correct me
>if my hunch is out of whack (Dr. John? David?).
Paul, Dan seems to have beaten me to it, and said most of what I would have
said. As he points out, your code will produce a stratified random sample.
This might well be OK for the John's purpose, but it will, 'on average',
be a much more 'evenly spread' sample than would (again 'on average') be
the case with a simple random sample. One of the illusions that many
Lottery players are under is that a random sample ought to be fairly
'uniform' in its spread!
Dan goes on to write:
>My initial thought was to use a without-replacement sampling scheme.
Something
>like:
>
>%let total=1e10;
>%let wanted=500000;
>
>n=0;
>needed=&wanted;
>remain=total;
>do i=1 to &total until(needed eq 0);
> if ranuni(-1) le needed/remain
> then do;
> n+1;
> r[n]=i;
> needed+(-1);
> end;
> remain+(-1);
>end;
>
>This will do the job of a simple random sample, but it not very efficient.
It
>would have taken over 10 hours on my 133Mhz pc. Maybe someone else can
produce
>a more efficient simple random sample. On the other hand if time isn't a
>problem, this works.
Apart from a missing amersand (remain=&total) I agree - but, just like Dan,
I am using a 133 Mhz PC, and so would find this more than a little tedious!
As always in these situations, it's the 'without replacement' which is the
pain. For 'with replacement' one could simply use something like:
data _null_;
array r (500000) _temporary_;
do i=1 to dim(r);
r(i) = ceil(ranuni(whatever)*1e10) ;
end;
<Paul's favourite array sorting routine here!>
run;
Ironically, since we are sampling 500,000 from 1e10, the chance of any
value being repeated is pretty tiny - so, in this particular case, the
'with replacement' solution stands a very high chance of effectively being
a 'without replacement' one. However, that obviously is not a certainty.
If "Paul's favourite array sorting routine" can be easily adapted to remove
duplicates, then one might consider something like (untested!):
data _null_ ;
array s (600000) _temporary_ ;
array r (500000) _temporary_ ;
do i=1 to dim(s);
s(i) = ceil(ranuni(whatever)*1e10) ;
end;
<Paul's favourite array sorting + duplicate removing routine here!>
do i = 1 to dim(r) ;
do until r(i) ne . ;
r(i) = s(ranuni(whatever)*dim(s)) ;
end ;
run;
Of course, if one had adequate resources to set up a 1e10 temporary array,
the one might consider something like (again untested):
data _null_;
array r (500000) _temporary_;
array s (1e10) _temporary_ ;
do i=1 to dim(r);
OK = 0 ;
do until OK = 1 ;
r(i) = ceil(ranuni(whatever)*1e10) ;
if s(r(i)) = . then do ;
OK = 1 ; s(r(i)) = 1 ;
end ;
end ;
end;
<Paul's favourite array sorting routine here!>
run;
I think that exhausts my immediate thoughts!
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------