Date: Wed, 20 Jan 2010 15:12:34 -0500
Reply-To: "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject: Re: Hash Question
In-Reply-To: <100FCDB28E638D4B903CB7D2056E43C715FFE3@USFCH-MAIL1.lewin.com>
Content-Type: text/plain; charset="us-ascii"
Ted:
In addition to the reasons provided in other responses, I
wonder if your use of HASH is suffering from the fact that
you are forced (when using hash in a data step) to use IF
for subsetting rather than WHERE.
Your SQL is probably taking the 75 acceptable values in
CRBITOP75 and implicitly using WHERE to search the
FULLPAR10AGRPROUT dataset for those values. If so then
SQL is using the data engine to do the subsetting. It
probably won't matter how fast the HASH logic is in your
data step when the subsetting is delayed until each and
every observation is put into the PDV (even after
accounting for the 31 columns vs 2 columns issue).
What you might do to test my theory is put those 75 values
as an argument of the WHERE statement and retry your data
step.
data _null_;
length where_list $32767;
retain where_list;
set out.crbitop75 end=end_of_top75;
where_list=catx(',',where_list,v27drg);
if end_of_top75 then
call symput('where_list',trim(where_list));
run;
data out.top75crpipatients;
set .out.fullpar10grpout;
where grouperdrg in (&where_list);
run;
I assume that v27drg is a numeric variable above, but if
it's a 10-character sting variable then replace the
statement above with:
where_list=catx(',',where_list,put(v27drg,quote12.));
Regards,
Mark
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Kirby, Ted
> Sent: Wednesday, January 20, 2010 6:30 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Hash Question
>
> I am starting to use Hash Objects more and more and thought I would do
> a little test. I ran code that will generate a dataset of all patient
> numbers where the DRG assigned to the case was one in a list of top 75
> DRGs. The log entries are below.
>
> What surprised me is that the hash solution took longer in real time
> than the PROC SQL solution did. Although the hash solution did take
> less CPU time. Any thoughts?
>
> PROC SQL SOLUTION:
> 85 proc sql;
> 86 create table out.Top75CRBIPatients as
> 87 select o.PatientNum,V27DRG from out.FullPAR10GrprOut o
> 88 right join out.CRBITop75 t on GrouperDRG=V27DRG;
> NOTE: Table OUT.TOP75CRBIPATIENTS created, with 3887478 rows and 2
> columns.
> 89 quit;
> NOTE: PROCEDURE SQL used (Total process time):
> real time 47.06 seconds
> cpu time 28.53 seconds
>
>
> HASH SOLUTION:
> 91 data out.Top75CRBIPatients;
> 92 if _n_=1 then do;
> 93 declare hash CRBIDRGs(dataset: 'out.CRBITop75');
> 94 rc=CRBIDRGs.defineKey('V27DRG');
> 95 rc=CRBIDRGs.defineDone();
> 96 end;
> 97 set out.FullPAR10GrprOut;
> 98 V27DRG=GrouperDRG;
> 99 rc=CRBIDRGs.find();
> 100 if rc=0 then output;
> 101 run;
> NOTE: There were 75 observations read from the data set OUT.CRBITOP75.
> NOTE: There were 13122131 observations read from the data set
> OUT.FULLPAR10GRPROUT.
> NOTE: The data set OUT.TOP75CRBIPATIENTS has 3887478 observations and
> 31 variables.
> NOTE: DATA statement used (Total process time):
> real time 1:22.95
> cpu time 22.10 seconds
>
>
> Ted Kirby
> Consultant,
> The Lewin Group, Inc.
> 3130 Fairview Park Drive, Suite 800
> Falls Church, VA 22042
> Phone: (703)269-5507
> Fax: (703)269-5501
> e-mail: ted.kirby@lewin.com <mailto:ted.kirby@lewin.com>
>
> ************* IMPORTANT - PLEASE READ ********************
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the
> intended recipient or his or her authorized agent, the reader is hereby
> notified that any dissemination, distribution or copying of this e-mail
> is prohibited. If you have received this e-mail in error, please notify
> the sender by replying to this message and delete this e-mail
> immediately.
>
|