```Date: Mon, 29 Sep 2008 10:52:17 -0400 Reply-To: Muthia Kachirayan Sender: "SAS(r) Discussion" From: Muthia Kachirayan Subject: Re: Performance issues in Permutation Test In-Reply-To: <2fc7f3340809290747t4365070doa2f505923208f75e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 It must be 19 not 18. On Mon, Sep 29, 2008 at 10:47 AM, Muthia Kachirayan < muthia.kachirayan@gmail.com> wrote: > > > On Mon, Sep 29, 2008 at 5:25 AM, shiva wrote: > >> Hi All, >> >> I am pulling 500,000 samples without replacement(sample size of >> 50000).when i am running this code i am facing some performance >> issues.As i dont have STAT i have written this code to do the >> following. >> And also its creating m*n obs in the output dataset which is very huge >> to compute mean after that. >> >> data sample(drop=i); >> array b{500000} _temporary_; >> do sampnum = 1 to dim(b); >> do i = 1 to 50000; >> x = round(ranuni(1234) * nobs); >> set Merge >> nobs = nobs >> point = x; >> output; >> end; >> end; >> stop; >> run; >> >> Thanks in advance! >> shiva >> > > Shiva, > > There are couple of questions. > > 1. Why do you need the array B[ ] here ? > > 2. Where do you ensure that samples are taken without replacement (WOR)? > > 3. In, M * N obs , what are M and N ? > > I guess that you have a population of size 500,000 and you want a 500,000 > replicated samples of size of 50,000 each with WOR. If this guess is wrong > tell SAS-L for help. > > Richard's suggestion to use K / N approach can be used. I give an alternate > approach slightly different from DataNull's wherein I use array B[ ] to mark > the observation from the Population being selected for the sample so that > that unit will not be selected subsequently to ensure WOR. I use GOTO > statement to make it simple to get rid of X and choose another X. > > In the SET statement I get only AGE to PDV to save memory space. > Immidiately the sum of AGE can be found( not in another Data Step) and its > Mean at the end of the DO-loop. The array B[ ] has to be intialized to > missing before going for the next sample. > > See the code below. > > SASHELP.CLASS has 19 observations. Hence 19 samples of size 5 WOR.The > sample means are generated in the same data step. These 2 numbers have been > statically used in the code below, 18 for array dimensioning and 5 as > denominator for sample mean, . > > The use of SET with POINT = option is not efficient for large sized data > set can be verified. > > data sample(keep = sampnum meanage); > array b[19] _temporary_; > do sampnum = 1 to dim(b); > do i = 1 to 5; > There: > x = ceil(ranuni(1234) * nobs); > if b[x] then goto There; > b[x] = 1; > set sashelp.class(keep = age) nobs = nobs point = x; > *output; > agetot + age; > end; > meanage = agetot / 5; > agetot = 0; > output; > do i = 1 to dim(b); > b[i] = .; > end; > end; > stop; > run; > > Regards, > > Muthia Kachirayan > > ```

Back to: Top of message | Previous page | Main SAS-L page