Date: Mon, 26 Apr 2010 12:32:18 -0400
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: extracting groups for a large dataset
In-Reply-To: <201004261539.o3QAlQHB027254@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Wing Wah,
In your code you use 2 datasteps to process the data and cycling the orders
dataset twice and Orders2 once takes more I/O time.
Because you have 600 million observations, saving I/O time will effectively
reduce the run-time.
Here is another hash solution where dataset, Orders, is twice cycled. You
can see some shortcuts used in the code. Time the codes and choose the one
takes least run-time.
data need;
if _n_ = 1 then do;
if 0 then set orders;
declare hash h(hashexp:20);
h.definekey('name');
h.definedone();
do until(eof);
set orders end = eof;
where class = 'S';
if h.find() ne 0 then h.add();
end;
end;
set orders;
if h.check() = 0;
run;
Kind regards,
Muthia Kachirayan
On Mon, Apr 26, 2010 at 11:39 AM, Wing Wah Tham <wingwahtham@yahoo.co.uk>wrote:
> Dear all,
>
> I have sort of done this using hash. Maybe I could have done this better
> for my reference file order2 by including it in the hash. All comments and
> advices are welcome. Thank you all for your comments.
>
> data orders;
> input name $2. vol $2. class $2.;
> datalines;
> X 1 S
> X 2 F
> X 1 S
> Y 4 F
> Y 3 F
> Y 2 F
> Y 5 F
> Y 6 F
> Y 7 F
> Y 8 F
> Z 10 S
> Z 12 S
> Z 11 S
> Z 9 F
> Z 13 F
> ;
> run;
>
> data orders2;
> set orders;
> where class='S';
> run;
>
>
> data orders3;
> length name $2. vol $2. class $2.;
> If _N_ = 1 Then Do;
> declare hash h (Dataset:'orders2');
> h.definekey('name');
> h.definedata('name');
> h.definedone();
> end;
>
> set orders;
> If h.Find() = 0 Then
> Output;
> RUN;
>
|