Date: Wed, 16 Aug 2006 15:16:01 -0400
Reply-To: Peter Constantinidis <peter@CONSTANTINIDIS.CA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Constantinidis <peter@CONSTANTINIDIS.CA>
Subject: Is hashing the right approach for this table lookup problem?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi folks:
Let's say a sample dataset is created of:
data testset;
length codes $ 50;
input name $ 1-6 codes $;
datalines;
mikey 05223020432100810003288
jonny 12402250110021001050718
gabby 12402250120021001051954
helga 12402250138421001050728
zelda 12402250142721001050454
run;
And I have a table that is 3000 rows long of valid codes, let's say 5
are provided here:
data codes;
length costcent $ 5;
input costcent $;
datalines;
11111
22222
51954
33333
44444;
run;
And in work.testset the task is to identify the valid codes by
starting at the farthest right side and moving backwards until a hit
is made. If you provide the key, this can be written as:
data _null_;
set testset;
key='51188';
location = find(codes, key, -999,"t");
if location=0 then put 'no match';
else if location ne 0 then put key;
run;
Running this will find a match for gabby as it is the only one with a
valid code.
My problem here is, the above function in _null_ only works because I
provided the find function with the key manually.
What I need to do, is to be able to tell it that key is a stack of #s,
not just 1, and to check every one of them.
Any suggestions on the best approach to this would be wonderful. Right
now I'm reading papers on hashing.
Thanks!
P.