LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2000, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 3 Mar 2000 20:46:28 GMT
Reply-To:     sashole@mediaone.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: Ordered Random Extract
Comments: To: johnmegargee@NETSCAPE.NET
Comments: cc: sashole@mediaone.net
Content-Type: text/plain; format=flowed

John,

Try this:

19 %let n = 1e10; 20 data _null_; 21 array r (500000) _temporary_; 22 s = &n / dim(r); 23 do i=1 to dim(r); 24 r(i) = (i-1)*s + ceil(ranuni(1)*s); 25 end; 26 run; NOTE: The DATA statement used 0.02 CPU seconds and 7062K.

The idea here is to split the entire [1:1e10] range into 5e5 equal pieces and make the next selection from the next consecutive interval containing &n/dim(r) numbers. By the nature of the algorithm, the selected items will be unique and ascending. Intuitively, it should provide for a strictly equal chance for any [1:1e10] number to be selected if &n is a multiple of dim(r). If not, there should be some skewness pertaining to the choice from the last interval, but I would not expect it to be sizable. Someone please correct me if my hunch is out of whack (Dr. John? David?). To get a real feel of what this process spits out, let us see how it will select 50 items out of 1000000 (same proportion):

85 %let n = 1e6; 86 data _null_; 87 array r (1:50) _temporary_; 88 s = &n / dim(r); 89 do i=1 to dim(r); 90 r(i) = (i-1)*s + ceil(ranuni(1)*s); 91 file log ls=40; 92 put r(i) z6. @ +1; 93 end; 94 run; 003700 039402 047997 065188 098433 119386 130860 150634 160996 181332 216387 230478 257068 261344 299141 305944 325453 353799 379536 384531 413765 428256 451172 465745 489516 516900 532691 551808 571652 587541 614568 630133 658625 678583 691794 705945 727821 749449 773591 783362 803331 837423 845976 878693 898010 911376 920991 942712 970227 988665 NOTE: The DATA statement used 0.01 CPU seconds and 3248K.

BTW, the same principle can be apparently applied to the selection from a SAS dataset using POINT= to eliminate the necessity of reading the entire darn thing and subsequent sorting. Say,

95 %let n = 1e8; 96 %let k = 5e3; 97 data big; do v=1 to &n; output; end; run; NOTE: The data set WORK.BIG has 100000000 observations and 1 variables. NOTE: The DATA statement used 51.23 CPU seconds and 3600K. 98 data sample; 99 s = n / &k; 100 do i=1 to &k; 101 r = (i-1)*s + ceil(ranuni(1)*s); 102 set big point=r nobs=n; 103 output; 104 end; 105 stop; 106 run; NOTE: The data set WORK.SAMPLE has 5000 observations and 3 variables. NOTE: The DATA statement used 0.39 CPU seconds and 3767K.

HTH.

Kind regards, ======================= Paul M. Dorfman Jacksonville, FL =======================

>From: John Megargee <johnmegargee@NETSCAPE.NET>

>Somewhere in the middle of a long datastep I've inherited I need to >populate a >huge temp array like > >array r(500000) _temporary_; > >with unique positive integers chosen randomly from 1 to 1e10 without >replacement. It wouldn't be a problem if I didn't need the populated array >to >be in ascedning order. Putting it another way, I want the elements of the >array to retain their relative positions in the set [1:1e10] they've been >selected from. Any ideas about doing it efficiently (hopefully without >leaving >the step)? I'll appreciate them all. > >TIA, John > >____________________________________________________________________ >Get your own FREE, personal Netscape WebMail account today at >http://webmail.netscape.com.

______________________________________________________ Get Your Private, Free Email at http://www.hotmail.com


Back to: Top of message | Previous page | Main SAS-L page