Date: Thu, 25 Jun 2009 09:34:38 -0500
Reply-To: Carl Denney <cdenney@HEALTHINFOTECHNICS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Carl Denney <cdenney@HEALTHINFOTECHNICS.COM>
Subject: Fwd: Re: Fuzzy matching question
Content-Type: text/plain; x-AttchRem=yes; charset="us-ascii"; format=flowed
>varname=tranwrd(varname,'fluorouracil','Fluorouracil');
>varname=tranwrd(varname,'adrucil','Fluorouracil');
>varname=tranwrd(varname,'5-fu','Fluorouracil');
>varname=tranwrd(varname,'5-fu civ','Fluorouracil');
>varname=tranwrd(varname,'5-fu ci','Fluorouracil');
>varname=tranwrd(varname,'5-fu ivp','Fluorouracil');
>varname=tranwrd(varname,'5fu','Fluorouracil');
>
>But why don't you change it to the NDC code instead?
>
>
>
>
>At 12:44 PM 6/24/2009, you wrote:
>>Paul:
>>We have found that unstructured reporting of medication introduces
>>a variety of problems. For instance,
>>- multiple genres, including brand names, generic names, codes, and
>>brand names differentiated by dosage;
>>- spelling and abbreviation differences;
>>- similar names;
>>- mixtures of comments, names, and codes.
>>
>>You may find that appending distinct pairs of observed value and
>>standard drug identifier will give you a mapping that you can use
>>to classify strings. You might also add a match probability that
>>could help you order matches and select those with greater chances
>>of being a correct match.
>>S
>>
>>________________________________________
>>From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] On Behalf Of Paul
>>Miller [pjmiller_57@YAHOO.COM]
>>Sent: Wednesday, June 24, 2009 10:55 AM
>>To: SAS-L@LISTSERV.UGA.EDU
>>Subject: Fuzzy matching question
>>
>>Hello Everyone,
>>
>>I'm working with some cancer drugs and need to do some fuzzy
>>matching. I've experimented with a few different functions,
>>including upcase, lowcase, propcase, spedis, index, indexc, in
>>various combinations but have yet to find what I need.
>>
>>The drugs I'm working with often go under a variety of different
>>names. The case in which the drug names are entered varies.
>>Sometimes they're misspelled. Sometimes the name of the drug is
>>combined with information about the method of administration (e.g.,
>>'civ', 'ci', 'ivp').
>>
>>An example involving some very simple code appears below:
>>
>>else if lowcase(drug_name) in ('fluorouracil' 'adrucil' '5-fu'
>>'5-fu civ' '5-fu ci' '5-fu ivp' '5fu') then agent = 'Fluorouracil';
>>
>>Is there some way to elegantly combine SAS functions so that SAS
>>will look for terms that sound like/contain 'fluorouracil' or
>>'adrucil' or '5-fu' and then code them all as 'Fluorouracil'?
>>
>>I've been able to find functions like index that simultaneously
>>look for different names (e.g., 'fluorouracil' 'adrucil') where the
>>spelling is exact. I've also been able to find functions like
>>spedis that allow me to do fuzzy matching for a single name (e.g.,
>>'fluorouracil') but not for different names simultaneouly (e.g.,
>>'fluorouracil' and 'adrucil'). So I'm just wondering if there's
>>some way to combine functions so that I get the best of both
>>worlds. Alternatively, I thought there might be some functions I'm
>>not aware of that could be put to good use.
>>
>>Thanks,
>>
>>Paul
|