Date: Wed, 30 Jul 2003 19:31:50 GMT
Reply-To: Aaron Moynahan <aaron.moynahan@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Aaron Moynahan <aaron.moynahan@VERIZON.NET>
Subject: Re: Count distinct in sas
Howard,
I think the logic is exactly the same between your sql procedure with the
nested table and running two consecutive summary procedures. The summary
procedures are still faster which makes me think that SAS still needs to so
some more tunning work with proc sql.
85 data initial;
86 do product=1 to int(ranuni(123)*1000);
87 do hhd_id=1 to int(ranuni(345)*1000);
88 do dummy=1 to int(ranuni(456)*10);
89 output;
90 end;
91 end;
92 end;
93 run;
NOTE: The data set WORK.INITIAL has 1671518 observations and 3 variables.
NOTE: DATA statement used:
real time 6.28 seconds
cpu time 1.10 seconds
94
95
96 proc summary data=initial;
97 class hhd_id product;
98 types hhd_id*product;
99 output out=agg1(keep=hhd_id product _freq_);
100 run;
NOTE: There were 1671518 observations read from the data set WORK.INITIAL.
NOTE: The data set WORK.AGG1 has 334291 observations and 3 variables.
NOTE: PROCEDURE SUMMARY used:
real time 3.71 seconds
cpu time 3.45 seconds
101
102 proc summary data=agg1;
103 class product;
104 types product;
105 output out=agg2(keep=product _freq_ rename=(_freq_=count));
106 run;
NOTE: There were 334291 observations read from the data set WORK.AGG1.
NOTE: The data set WORK.AGG2 has 750 observations and 2 variables.
NOTE: PROCEDURE SUMMARY used:
real time 0.34 seconds
cpu time 0.34 seconds
107 proc sql;
108 create table t3 as
109 select distinct product,count(*) as cnt
110 from (select distinct product,hhd_id from xx)
111 group by product;
NOTE: Table WORK.T3 created, with 750 rows and 2 columns.
112 quit;
NOTE: PROCEDURE SQL used:
real time 6.12 seconds
cpu time 5.48 seconds