Date: Thu, 21 Dec 2006 05:00:40 -0500
Reply-To: Hakan ENER <hakanener99@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Hakan ENER <hakanener99@YAHOO.COM>
Subject: Re: Implementing a macro using a list of elements in string
character
Hello again Howard,
Thanks for the tips. I am pretty certain that the simplifying assumption I
mentioned (that a given sentence in the paragraph contains no more than one
industry name and no more than one 4-digit year) is accurate for this
dataset, so I do not see it as a problem.
The list of industries to find and match in the paragraph text was built
specifically for this dataset, so that is not a concern either.
But it is impossible for me to construct a normalized data structure as
you suggested (one observation for each company/industry/year triplet)
because the text is just a long paragraph and to divide it up in this nice
and clean way seems impossible, except by dividing up all the sentences in
the paragraph and make each sentence a new observation for the same
company. (Then a given observation may or may not contain any relevant
information, but that's fine)
Hakan
On Wed, 20 Dec 2006 21:25:36 -0500, Howard Schreier <hs AT dc-sug DOT org>
<nospam@HOWLES.COM> wrote:
>Was this list built specifically from and for your file, so that you can be
>sure that all variations are included? If not, that issue is only partially
>addressed.
>
>Simplifying assumptions are fine when you are presenting examples and
>explanations, but you cannot just assume away problems which may complicate
>your process. Is it in fact assured that there are no sentences with
>multiple year references or multiple industry references?
>
>It's probably better to use a normalized data structure (one observation
for
>each company/industry/year triplet.
>
|