Date: Sun, 14 Jan 2007 20:56:28 -0500
Reply-To: Ken Borowiak <EvilPettingZoo97@AOL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ken Borowiak <EvilPettingZoo97@AOL.COM>
Subject: Re: Extracting word(s) occurring in text before a certain keyword
On Sun, 14 Jan 2007 05:29:53 -0800, Hakan Ener <hakanener99@YAHOO.COM> wrote:
> Hello,
>
> I could not find a general solution to what I'm
>trying to do when analyzing a character variable that
>contains unstructured text.
>
> Each observation contains a paragraph of text
>(multiple sentences separated by period), where names
>of certain companies are mentioned, such as "Microsoft
>Inc." or "Advanced Micro Devices Corp." within
>sentences. I want to extract the company name that
>precedes "Inc." or "Corp." in this text. Considering
>that company names may contain any number of words
>(each of which have a capital first letter), and that
>an observation may contain any number of company names
>one after the other, is there a suggestion to handle
>this coding such that the result will be a horizontal
>array of full company names mentioned in the source
>field?
>
>Thank you,
>
>Hakan Ener
>France
>
Hakan,
Regular expressions in conjuntion with the PRX functions can help you out.
If you post some sample observations and a somewhat complete set of what
anchors the company name (e.g. Inc., Corp.), I could cook up something more
concrete.
Ken
|