LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2011, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 23 Aug 2011 01:29:53 -0400
Reply-To:     Søren Lassen <s.lassen@POST.TELE.DK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Søren Lassen <s.lassen@POST.TELE.DK>
Subject:      Re: Many to many merge using Hash
Comments: To: Hari Nath <hari_s_nath@YAHOO.COM>
Content-Type: text/plain; charset=ISO-8859-1

Hari, You are basically doing a full join of the ALL table with itself. In your case, I do not think that is necessary:

proc sort data=all out=purch(keep=id p_dt rename=(p_dt=dt)); by id p_dt; run; proc sort data=all out=disp(keep=id d_dt cnt rename=(d_dt=dt)); by id d_dt; run;

data want; set purch(in=p) disp(in=d); by id dt; if first.id then var1=0; var1+p; if d; keep id cnt var1; run;

proc sort; by id cnt; run;

In the example here, a purchase will be counted as before a disposal on the same date. If you do not want that, you should change the line set purch(in=p) disp(in=d); to set disp(in=d) purch(in=p);

Regards, Søren

On Mon, 22 Aug 2011 17:04:57 -0400, Hari Nath <hari_s_nath@YAHOO.COM> wrote:

>Hi all, > Following is my sample data and code. All am trying to find is the number >of purchases(p_dt) prior to disposals(d_dt). My actual 'all' dataset has >around 20 million records. The merge am trying to do here will go against 20 >million by 20 million and hence Hash is being used here. I had run this code >before and it ran for few hours which it never finished past the 'step 2'(so >i had to kill the job). > What i want to do is to combine 'step 3' in to 'step 2', which am not able >to accomplish as am a beginner with hashing. So this will cut down the time >taken by 'step 2', once the 'step 3' is integrated with it. > Please let me know with any suggestion/ ideas. > >Many Thanks. >-Hari > >data all ; > input id p_dt :date9. d_dt :date9. cnt $ ; > format p_dt d_dt date9. ; >cards ; >101 '01jan2006'd '30dec2007'd 1 >101 '15feb2006'd '30jun2007'd 2 >101 '01jan2007'd '31dec2009'd 3 >101 '01jan2007'd '30jun2009'd 4 >101 '01jan2010'd '15aug2011'd 5 >101 '01jan2011'd '15aug2011'd 6 >101 '25feb2011'd '15aug2011'd 7 >102 '25mar2011'd '15aug2011'd 1 >102 '25apr2011'd '15aug2011'd 2 >; >run ; > >/* STEP 1 */ >data purch ; > set all ; > b_cnt = cnt ; > keep id p_dt b_cnt ; >run ; >data disp ; > set all ; > keep id d_dt cnt ; >run ; > >/* STEP 2 */ >data datasetC ; > declare hash hh(hashexp: 10) ; > hh.definekey('id','_n_') ; > hh.definedata('p_dt','b_cnt') ; > hh.definedone() ; > > declare hash hs() ; > hs.definekey('id') ; > hs.definedata('_n_') ; > hs.definedone() ; > > do until (eof1) ; > set purch end = eof1 ; > _n_ ++ 1 ; > hs.replace() ; > hh.add() ; > end ; > > do until (eof2) ; > set disp end = eof2 ; > if hs.find() = 0 then do ; > do _n_ = 1 to _n_ ; > if hh.find() = 0 and p_dt < d_dt then do; > > output ; > keep id b_cnt cnt ; > end ; > end ; > end ; > end ; >stop ; >run ; > >/* STEP 3 */ >proc sql noprint ; > Create table datasetD as > select id, cnt , count(*) as var1 > from datasetC > group by id, cnt > order by id, cnt > ; >quit ;


Back to: Top of message | Previous page | Main SAS-L page