LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 15 Jan 2007 22:06:40 +0100
Reply-To:   Martin Gregory <gregorym@T-ONLINE.DE>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Martin Gregory <gregorym@T-ONLINE.DE>
Organization:   T-Online
Subject:   Re: detect dots as end of sentence
Comments:   To: sas-l@uga.edu
In-Reply-To:   <007601c738c2$253db380$6fb91a80$@net>
Content-Type:   text/plain; charset=ISO-8859-1; format=flowed

More general would be to use the regular expression that Emacs uses for end of sentence:

[.?!][]\"')]*($| $|\t| )[ \t\n]*

and use this instead of the \. in Alan's suggestion.

-Martin

Alan Churchill wrote: > Arjen, > > I would use regular expressions here. > > Split the text using the following regex: > > (?<=[a-z])\. > > That should give you what you need. > > Alan > > Alan Churchill > Savian "Bridging SAS and Microsoft Technologies" > www.savian.net > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Arjen > Sent: Monday, January 15, 2007 9:04 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: detect dots as end of sentence > > Hi SAS-L, > > Please look at the tested code below. I am trying to split sentences in > character strings by detecting the dot (.) at the end of a sentence. I > encounter difficulties because there are numbers with dots in them. I > figure two solutions: > (i) Replace all numbers with numbers written down European-style: 4,31 > g 12,5 years - I have no clue > (ii) Split sentences by searching for a dot and a space (. ); I tried > to include a space in the code, but then I get a dataset with all > separate words sorted out, which is not what I need. > > Any suggestions? Thanks. > > Arjen > > data SOURCE; > x = "Daily intake of less than 4.31 g in people younger than 12.5 did > not cause any harmful effects. I would highly recommend this drug." > ; > run; > > data SOURCE; set SOURCE; id+1; run; > > data need (drop = i); > length y $5000; > set source; > do i = 1 to 100 while(scan(x,i,".") ne ""); > y = scan(x,i,".")||'.'; > output; > end; > run;


Back to: Top of message | Previous page | Main SAS-L page