LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2003, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 29 Jul 2003 08:52:52 -0700
Reply-To:     "Huang, Ya" <yhuang@AMYLIN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Huang, Ya" <yhuang@AMYLIN.COM>
Subject:      Re: Count distinct in sas
Comments: To: Nembra <jesus.garcia@DUNNHUMBY.COM>
Content-Type: text/plain; charset="iso-8859-1"

Nembra,

Here is what I just tested:

1 data xx; 2 do product=1 to int(ranuni(123)*1000); 3 do hhd_id=1 to int(ranuni(345)*1000); 4 do dummy=1 to int(ranuni(456)*10); 5 output; 6 end; 7 end; 8 end; 9 run;

NOTE: The data set WORK.XX has 1671518 observations and 3 variables. NOTE: DATA statement used: real time 2.28 seconds cpu time 0.58 seconds

10 11 proc sql; 12 create table t1 as 13 select product,count(distinct hhd_id) as cnt 14 from xx 15 group by product 16 ; NOTE: Table WORK.T1 created, with 750 rows and 2 columns.

17 NOTE: PROCEDURE SQL used: real time 13.57 seconds cpu time 12.50 seconds

18 proc sql; 19 create table t2 as 20 select distinct product,hhd_id 21 from xx 22 ; NOTE: Table WORK.T2 created, with 334291 rows and 2 columns.

23 create table t2 as 24 select distinct product,count(*) as cnt 25 from t2 26 group by product 27 ; NOTE: Table WORK.T2 created, with 750 rows and 2 columns.

28 quit; NOTE: PROCEDURE SQL used: real time 3.91 seconds cpu time 3.53 seconds

--------------------------------

Note, the first sql step is in the same format as yours, which took 13.57 seconds to run. The second sql step has two pass, but it took only 3.91 seconds to run. Both of them got the same result. It seems to me that you may save big time by using the second format. Hopefully, 3.91/13.57*9hours=2.59hours !!

Kind regards,

Ya Huang

-----Original Message----- From: Nembra [mailto:jesus.garcia@DUNNHUMBY.COM] Sent: Tuesday, July 29, 2003 7:45 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Count distinct in sas

Hi,

I am working with datasets that have around 400 millions of transactions (grocery ones) and I need to aggregate households.

I have used proc sql and it takes 9 hours.

Then, I used sort + first and it took 4 hours.

Do you know a proc that count distinct values doing a group by?

My steps have been:

1. proc sql; create table sol as select product, count(distinct hhd_id) from initial group by product; quit;

2. proc sort data=initial; by product hhd_id; run;

data sol; retain cust 0; set initial; by product hhd_id; if first.product then cust=0; if first.hhd_id then cust +1; if last.hhd_id then output; run;

Cheers.

J.


Back to: Top of message | Previous page | Main SAS-L page