LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 29 Sep 2008 10:52:17 -0400
Reply-To:     Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:      Re: Performance issues in Permutation Test
In-Reply-To:  <2fc7f3340809290747t4365070doa2f505923208f75e@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

It must be 19 not 18.

On Mon, Sep 29, 2008 at 10:47 AM, Muthia Kachirayan < muthia.kachirayan@gmail.com> wrote:

> > > On Mon, Sep 29, 2008 at 5:25 AM, shiva <shiva.saidala@gmail.com> wrote: > >> Hi All, >> >> I am pulling 500,000 samples without replacement(sample size of >> 50000).when i am running this code i am facing some performance >> issues.As i dont have STAT i have written this code to do the >> following. >> And also its creating m*n obs in the output dataset which is very huge >> to compute mean after that. >> >> data sample(drop=i); >> array b{500000} _temporary_; >> do sampnum = 1 to dim(b); >> do i = 1 to 50000; >> x = round(ranuni(1234) * nobs); >> set Merge >> nobs = nobs >> point = x; >> output; >> end; >> end; >> stop; >> run; >> >> Thanks in advance! >> shiva >> > > Shiva, > > There are couple of questions. > > 1. Why do you need the array B[ ] here ? > > 2. Where do you ensure that samples are taken without replacement (WOR)? > > 3. In, M * N obs , what are M and N ? > > I guess that you have a population of size 500,000 and you want a 500,000 > replicated samples of size of 50,000 each with WOR. If this guess is wrong > tell SAS-L for help. > > Richard's suggestion to use K / N approach can be used. I give an alternate > approach slightly different from DataNull's wherein I use array B[ ] to mark > the observation from the Population being selected for the sample so that > that unit will not be selected subsequently to ensure WOR. I use GOTO > statement to make it simple to get rid of X and choose another X. > > In the SET statement I get only AGE to PDV to save memory space. > Immidiately the sum of AGE can be found( not in another Data Step) and its > Mean at the end of the DO-loop. The array B[ ] has to be intialized to > missing before going for the next sample. > > See the code below. > > SASHELP.CLASS has 19 observations. Hence 19 samples of size 5 WOR.The > sample means are generated in the same data step. These 2 numbers have been > statically used in the code below, 18 for array dimensioning and 5 as > denominator for sample mean, . > > The use of SET with POINT = option is not efficient for large sized data > set can be verified. > > data sample(keep = sampnum meanage); > array b[19] _temporary_; > do sampnum = 1 to dim(b); > do i = 1 to 5; > There: > x = ceil(ranuni(1234) * nobs); > if b[x] then goto There; > b[x] = 1; > set sashelp.class(keep = age) nobs = nobs point = x; > *output; > agetot + age; > end; > meanage = agetot / 5; > agetot = 0; > output; > do i = 1 to dim(b); > b[i] = .; > end; > end; > stop; > run; > > Regards, > > Muthia Kachirayan > >


Back to: Top of message | Previous page | Main SAS-L page