| Date: | Thu, 25 Jun 2009 16:21:05 -0700 |
| Reply-To: | Richard <richard.hockey@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Richard <richard.hockey@GMAIL.COM> |
| Organization: | http://groups.google.com |
| Subject: | Re: Fwd: Re: Fuzzy matching question |
|
| Content-Type: | text/plain; charset=ISO-8859-1 |
On Jun 26, 12:34 am, cden...@HEALTHINFOTECHNICS.COM (Carl Denney)
wrote:
> >varname=tranwrd(varname,'fluorouracil','Fluorouracil');
> >varname=tranwrd(varname,'adrucil','Fluorouracil');
> >varname=tranwrd(varname,'5-fu','Fluorouracil');
> >varname=tranwrd(varname,'5-fu civ','Fluorouracil');
> >varname=tranwrd(varname,'5-fu ci','Fluorouracil');
> >varname=tranwrd(varname,'5-fu ivp','Fluorouracil');
> >varname=tranwrd(varname,'5fu','Fluorouracil');
>
> >But why don't you change it to the NDC code instead?
>
> >At 12:44 PM 6/24/2009, you wrote:
> >>Paul:
> >>We have found that unstructured reporting of medication introduces
> >>a variety of problems. For instance,
> >>- multiple genres, including brand names, generic names, codes, and
> >>brand names differentiated by dosage;
> >>- spelling and abbreviation differences;
> >>- similar names;
> >>- mixtures of comments, names, and codes.
>
> >>You may find that appending distinct pairs of observed value and
> >>standard drug identifier will give you a mapping that you can use
> >>to classify strings. You might also add a match probability that
> >>could help you order matches and select those with greater chances
> >>of being a correct match.
> >>S
>
> >>________________________________________
> >>From: SAS(r) Discussion [SA...@LISTSERV.UGA.EDU] On Behalf Of Paul
> >>Miller [pjmiller...@YAHOO.COM]
> >>Sent: Wednesday, June 24, 2009 10:55 AM
> >>To: SA...@LISTSERV.UGA.EDU
> >>Subject: Fuzzy matching question
>
> >>Hello Everyone,
>
> >>I'm working with some cancer drugs and need to do some fuzzy
> >>matching. I've experimented with a few different functions,
> >>including upcase, lowcase, propcase, spedis, index, indexc, in
> >>various combinations but have yet to find what I need.
>
> >>The drugs I'm working with often go under a variety of different
> >>names. The case in which the drug names are entered varies.
> >>Sometimes they're misspelled. Sometimes the name of the drug is
> >>combined with information about the method of administration (e.g.,
> >>'civ', 'ci', 'ivp').
>
> >>An example involving some very simple code appears below:
>
> >>else if lowcase(drug_name) in ('fluorouracil' 'adrucil' '5-fu'
> >>'5-fu civ' '5-fu ci' '5-fu ivp' '5fu') then agent = 'Fluorouracil';
>
> >>Is there some way to elegantly combine SAS functions so that SAS
> >>will look for terms that sound like/contain 'fluorouracil' or
> >>'adrucil' or '5-fu' and then code them all as 'Fluorouracil'?
>
> >>I've been able to find functions like index that simultaneously
> >>look for different names (e.g., 'fluorouracil' 'adrucil') where the
> >>spelling is exact. I've also been able to find functions like
> >>spedis that allow me to do fuzzy matching for a single name (e.g.,
> >>'fluorouracil') but not for different names simultaneouly (e.g.,
> >>'fluorouracil' and 'adrucil'). So I'm just wondering if there's
> >>some way to combine functions so that I get the best of both
> >>worlds. Alternatively, I thought there might be some functions I'm
> >>not aware of that could be put to good use.
>
> >>Thanks,
>
> >>Paul
you could try proc spell (undocumented proc) do a google search on it
to get the syntax.
R
|