Date: Sun, 29 Nov 2009 18:41:56 -0800
Reply-To: rob <rob.ashmore@MBF.COM.AU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: rob <rob.ashmore@MBF.COM.AU>
Organization: http://groups.google.com
Subject: Hashing with SAS9.2 using multiple hash keys and a date range
(Revisited)
Content-Type: text/plain; charset=ISO-8859-1
About 6 months ago I posted a message to this group enquiring about
hashing with a date range.
At the time we were running on SAS9.1.3. We ended up building a
temporary solution using one of the suggestions made by one of the
SAS_L repondents. The solution worked well and involved building a
format.
We are now running on SAS9.2. We are embarking on some work which
means that the proc format
approach we employed is no longer suitable.
Does anyone have any examples of how we can build a hash with
duplicate key values? The correct key to use being determined by two
date columns which provide the range.
My origional posting (below) gives an example of the task.
regards,
rob ashmore
****ORIGINAL POST START*******
Dear all,
I wish to perform a many to many merge using a date range.
I could do this several ways using proc sort / data step or
perhaps even sql.
However, due to its large size, I dont want to sort the base dataset.
As for the sql possibilites, I am not too sure about them.
I have supplied some example sas code below to illustrate what I am
trying to do.
I have two datasets saleshistory (50+ million rows) and
clientDiscountDtls (10,000 rows).
Basically, the task is to add the column clientDiscount to every row
of the salesHistory
dataset. The common key is clientId. However, the clientDiscount
should only be populated
if the dateOfSale falls between the fromDate and toDate of the
clientDiscountDtls dataset.
If there is no corresponding date range, the value of clientDiscount
is set to missing.
If there is no matching clientId in the clientDiscountDtls table then
the record is still
outputted to the results dataset but the the value of clientDiscount
is set to missing.
I am hoping that the above can be achieved using a hash because I
need
it to run pretty
quickly and repeatedly.
I have done a little research and have found that hashing using SAS
9.1 will probably not be
useful for doing a many to many merge. However, we will soon be
moving
to SAS 9.2 and
I believe there are some new features in the hash data step component
object which may
allow what I am trying to do.
Could anyone help by showing me how I can achieve this using a hash,
either 9.1 or 9.2.
Also, in the interests of my own education, Id be keen to see any
example of how to do this using
proc sql. I suspect it may well be quite straightforward?
*Sample Datsets*;
data work.salesHistory;
infile cards ;
length clientId $4
productCode $2
dateOfSale 8
fullAmtPayable 8
;;;
input clientId $
productCode $
dateOfSale date9.
fullAmtPayable ;
cards;
C001 DX 22jan2001 100
C002 KR 22mar2006 230
C001 EZ 20sep2008 400
C001 DX 25may2004 250
C003 EK 15nov2008 300
C002 MG 10feb2003 50
;
run;
data work.clientDiscount;
format fromdate todate date9.;
infile cards ;
length clientId $4
fromDate 8
toDate 8
clientDiscount 8
;;;
input @1 clientId $
@6 fromDate date7.
@15 toDate date7.
clientDiscount 8.2
;
cards;
C001 26dec00 01may04 0.1
C001 01aug04 15oct07 0.25
C001 16oct07 . 0.05
C002 24feb03 01jan09 0.075
;
run;
**Desired Result Dataset**;
clientid / productcode / dateOfSale / fullAmtPayable / clientDiscount
C001 DX 22jan2001 100 0.10
C002 KR 22mar2006 230 0.075
C001 EZ 20sep2008 400 0.05
C001 DX 25may2004 250 .
C003 EK 15nov2008 300 .
C002 MG 10apr2003 50 .
etc etc
Thanks In Advance,
Rob Ashmore
****ORIGINAL POST END*******