Date: Thu, 7 Jul 2005 16:59:32 -0700
Reply-To: DavidL Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: DavidL Cassell <davidlcassell@MSN.COM>
Subject: Re: fuzzy string search
Content-Type: text/plain; format=flowed
>How can I program SAS 9.1 to perform search on strings allowing some
>variations at the ends of it.. ?
>I have entries such as:
>I want to code all of them as food.
>Is indexw in data step the way to go, or is there a better way to do it?
INDEXW() *might* be the way to go. It rather depends on what else
you want to do with your entries.
>SECOND: Does SAS allow for a true fuzzy string search e.g. recognizing
>"resturant" as "restaurant"?
SAS gives you a ton of options here. You can do true NDA pattern matching
with the RX... and PRX... functions. You can do Levenshtein edit distances
with COMPLEV() or generalized edit distances with COMPGED(). You can even
tweak the features of your 'fuzziness' with COMPGED() by using the CALL
COMPCOST() routine to alter the underlying scoring system. Then there's
SPEDIS, which computes a simpler 'spelling distance', and good old SOUNDEX()
as well. So you can make your searching as 'loose' or as 'tight' as you
soundex('restaurant') = soundex('resturant')
because SOUNDEX() essentially *ignores* all vowel groups, unless they're
the first letter of the word. SOUNDEX() was designed to link English-origin
names, so its utility is highly dependent on your list of task words.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ