Date: Mon, 30 Nov 2009 18:51:13 +0100
Reply-To: karma <dorjetarap@GOOGLEMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: karma <dorjetarap@GOOGLEMAIL.COM>
Subject: Re: Hashing with SAS9.2 using multiple hash keys and a date range
(Revisited)
In-Reply-To: <8633cace-5d2b-43e3-9da8-7840df842060@u8g2000prd.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi Rob,
I think you are looking for the multidata: 'y' option
This seems to produce the desired results, although I had to impute
the todate to todays date when missing, to match your example output
(using: todate = coalesce(todate,today()) ;) :
data want ;
if 0 then set clientDiscount ;
declare hash hh(dataset:'clientDiscount', multidata:'y') ;
hh.definekey('clientid');
hh.definedata('fromdate','todate','clientDiscount') ;
hh.definedone() ;
do until(eof) ;
set salesHistory end=eof ;
_rc = hh.find() ;
if (_rc=0) then do ;
if fromdate <= dateofsale <= todate then return ;
hh.has_next(result:_res) ;
do while(_res ne 0);
hh.find_next();
if fromdate <= dateofsale <= todate then return ;
hh.has_next(result: _res);
end ;
end ;
call missing(clientDiscount) ;*Make sure clientDiscount is
not retained;
return ;
end ;
stop ;
drop _: fromdate todate ;
format dateofsale date9. ;
run ;
proc print ;run;
2009/11/30 rob <rob.ashmore@mbf.com.au>:
> About 6 months ago I posted a message to this group enquiring about
> hashing with a date range.
> At the time we were running on SAS9.1.3. We ended up building a
> temporary solution using one of the suggestions made by one of the
> SAS_L repondents. The solution worked well and involved building a
> format.
>
> We are now running on SAS9.2. We are embarking on some work which
> means that the proc format
> approach we employed is no longer suitable.
>
> Does anyone have any examples of how we can build a hash with
> duplicate key values? The correct key to use being determined by two
> date columns which provide the range.
>
> My origional posting (below) gives an example of the task.
>
> regards,
>
> rob ashmore
>
>
>
>
> ****ORIGINAL POST START*******
> Dear all,
>
> I wish to perform a many to many merge using a date range.
> I could do this several ways using proc sort / data step or
> perhaps even sql.
>
>
> However, due to its large size, I dont want to sort the base dataset.
> As for the sql possibilites, I am not too sure about them.
>
>
> I have supplied some example sas code below to illustrate what I am
> trying to do.
>
>
> I have two datasets saleshistory (50+ million rows) and
> clientDiscountDtls (10,000 rows).
> Basically, the task is to add the column clientDiscount to every row
> of the salesHistory
> dataset. The common key is clientId. However, the clientDiscount
> should only be populated
> if the dateOfSale falls between the fromDate and toDate of the
> clientDiscountDtls dataset.
> If there is no corresponding date range, the value of clientDiscount
> is set to missing.
> If there is no matching clientId in the clientDiscountDtls table then
> the record is still
> outputted to the results dataset but the the value of clientDiscount
> is set to missing.
>
>
> I am hoping that the above can be achieved using a hash because I
> need
> it to run pretty
> quickly and repeatedly.
>
>
> I have done a little research and have found that hashing using SAS
> 9.1 will probably not be
> useful for doing a many to many merge. However, we will soon be
> moving
> to SAS 9.2 and
> I believe there are some new features in the hash data step component
> object which may
> allow what I am trying to do.
>
>
> Could anyone help by showing me how I can achieve this using a hash,
> either 9.1 or 9.2.
> Also, in the interests of my own education, Id be keen to see any
> example of how to do this using
> proc sql. I suspect it may well be quite straightforward?
>
>
> *Sample Datsets*;
>
>
> data work.salesHistory;
> infile cards ;
> length clientId $4
> productCode $2
> dateOfSale 8
> fullAmtPayable 8
> ;;;
>
>
> input clientId $
> productCode $
> dateOfSale date9.
> fullAmtPayable ;
> cards;
> C001 DX 22jan2001 100
> C002 KR 22mar2006 230
> C001 EZ 20sep2008 400
> C001 DX 25may2004 250
> C003 EK 15nov2008 300
> C002 MG 10feb2003 50
> ;
> run;
>
>
> data work.clientDiscount;
> format fromdate todate date9.;
> infile cards ;
> length clientId $4
> fromDate 8
> toDate 8
> clientDiscount 8
> ;;;
>
>
> input @1 clientId $
> @6 fromDate date7.
> @15 toDate date7.
> clientDiscount 8.2
> ;
> cards;
> C001 26dec00 01may04 0.1
> C001 01aug04 15oct07 0.25
> C001 16oct07 . 0.05
> C002 24feb03 01jan09 0.075
> ;
> run;
>
>
> **Desired Result Dataset**;
> clientid / productcode / dateOfSale / fullAmtPayable / clientDiscount
> C001 DX 22jan2001 100 0.10
> C002 KR 22mar2006 230 0.075
> C001 EZ 20sep2008 400 0.05
> C001 DX 25may2004 250 .
> C003 EK 15nov2008 300 .
> C002 MG 10apr2003 50 .
>
>
> etc etc
>
>
> Thanks In Advance,
>
>
> Rob Ashmore
>
> ****ORIGINAL POST END*******
>
|