Date: Mon, 29 Sep 2008 10:52:17 -0400
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: Performance issues in Permutation Test
In-Reply-To: <2fc7f3340809290747t4365070doa2f505923208f75e@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
It must be 19 not 18.
On Mon, Sep 29, 2008 at 10:47 AM, Muthia Kachirayan <
muthia.kachirayan@gmail.com> wrote:
>
>
> On Mon, Sep 29, 2008 at 5:25 AM, shiva <shiva.saidala@gmail.com> wrote:
>
>> Hi All,
>>
>> I am pulling 500,000 samples without replacement(sample size of
>> 50000).when i am running this code i am facing some performance
>> issues.As i dont have STAT i have written this code to do the
>> following.
>> And also its creating m*n obs in the output dataset which is very huge
>> to compute mean after that.
>>
>> data sample(drop=i);
>> array b{500000} _temporary_;
>> do sampnum = 1 to dim(b);
>> do i = 1 to 50000;
>> x = round(ranuni(1234) * nobs);
>> set Merge
>> nobs = nobs
>> point = x;
>> output;
>> end;
>> end;
>> stop;
>> run;
>>
>> Thanks in advance!
>> shiva
>>
>
> Shiva,
>
> There are couple of questions.
>
> 1. Why do you need the array B[ ] here ?
>
> 2. Where do you ensure that samples are taken without replacement (WOR)?
>
> 3. In, M * N obs , what are M and N ?
>
> I guess that you have a population of size 500,000 and you want a 500,000
> replicated samples of size of 50,000 each with WOR. If this guess is wrong
> tell SAS-L for help.
>
> Richard's suggestion to use K / N approach can be used. I give an alternate
> approach slightly different from DataNull's wherein I use array B[ ] to mark
> the observation from the Population being selected for the sample so that
> that unit will not be selected subsequently to ensure WOR. I use GOTO
> statement to make it simple to get rid of X and choose another X.
>
> In the SET statement I get only AGE to PDV to save memory space.
> Immidiately the sum of AGE can be found( not in another Data Step) and its
> Mean at the end of the DO-loop. The array B[ ] has to be intialized to
> missing before going for the next sample.
>
> See the code below.
>
> SASHELP.CLASS has 19 observations. Hence 19 samples of size 5 WOR.The
> sample means are generated in the same data step. These 2 numbers have been
> statically used in the code below, 18 for array dimensioning and 5 as
> denominator for sample mean, .
>
> The use of SET with POINT = option is not efficient for large sized data
> set can be verified.
>
> data sample(keep = sampnum meanage);
> array b[19] _temporary_;
> do sampnum = 1 to dim(b);
> do i = 1 to 5;
> There:
> x = ceil(ranuni(1234) * nobs);
> if b[x] then goto There;
> b[x] = 1;
> set sashelp.class(keep = age) nobs = nobs point = x;
> *output;
> agetot + age;
> end;
> meanage = agetot / 5;
> agetot = 0;
> output;
> do i = 1 to dim(b);
> b[i] = .;
> end;
> end;
> stop;
> run;
>
> Regards,
>
> Muthia Kachirayan
>
>
|