LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 21 Dec 2006 05:00:40 -0500
Reply-To:     Hakan ENER <hakanener99@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Hakan ENER <hakanener99@YAHOO.COM>
Subject:      Re: Implementing a macro using a list of elements in string
              character

Hello again Howard,

Thanks for the tips. I am pretty certain that the simplifying assumption I mentioned (that a given sentence in the paragraph contains no more than one industry name and no more than one 4-digit year) is accurate for this dataset, so I do not see it as a problem.

The list of industries to find and match in the paragraph text was built specifically for this dataset, so that is not a concern either.

But it is impossible for me to construct a normalized data structure as you suggested (one observation for each company/industry/year triplet) because the text is just a long paragraph and to divide it up in this nice and clean way seems impossible, except by dividing up all the sentences in the paragraph and make each sentence a new observation for the same company. (Then a given observation may or may not contain any relevant information, but that's fine)

Hakan

On Wed, 20 Dec 2006 21:25:36 -0500, Howard Schreier <hs AT dc-sug DOT org> <nospam@HOWLES.COM> wrote:

>Was this list built specifically from and for your file, so that you can be >sure that all variations are included? If not, that issue is only partially >addressed. > >Simplifying assumptions are fine when you are presenting examples and >explanations, but you cannot just assume away problems which may complicate >your process. Is it in fact assured that there are no sentences with >multiple year references or multiple industry references? > >It's probably better to use a normalized data structure (one observation for >each company/industry/year triplet. >


Back to: Top of message | Previous page | Main SAS-L page