LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2007, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 3 Apr 2007 09:01:32 -0400
Reply-To:   "data _null_;" <datanull@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "data _null_;" <datanull@GMAIL.COM>
Subject:   Re: INPUT statement and a variable with different value lengths
Comments:   To: David L Cassell <davidlcassell@msn.com>
In-Reply-To:   <BAY103-F14598C441FB17F4308F606B0670@phx.gbl>
Content-Type:   text/plain; charset=ISO-8859-1; format=flowed

I finally got to your suggested REGEX, well almost, yesterday afternoon. I had was using regex = prxparse('s/(\b[A-Z][A-Z]+\b)/ $1/'); with PRXCHANGE. I had some truncations problems with this when reading from cards and when omitting argument 4 from PRXCHANGE. I finally got to here, code below, but I don't really understand the truncation problem. I ended up including argument 4 and then assigning that variable to _INFILE_ after calling PRXCHANGE.

filename ft15f001 temp lrecl=256 recfm=v; data salaries; array prx[1] _temporary_; if _n_ eq 1 then prx[1] = prxparse('s/(\b[A-Z]+\b)/ $1/'); array infile[1] $32767 _temporary_;

infile ft15f001 stopover eof=eof; input @; call prxchange(prx[1],1,_infile_,infile[1]); _infile_ = infile[1]; input @1 agency &$50. lastnm:$20. firstnm:$20. jobtitle&$50. sal:dollar16.; list; return; eof: call prxfree(prx[1]); stop; parmcards4; Agricorp THOMSON TOM Director, Corporate Services $100,000.00 Alcohol Commission BEETHOVEN LOU Manager, Network Services $100,000.00 Smart Systems for Health Agency MATISSE HENRY Director, Risk Management $150,000.00 Social Benefits Tribunal BUCHWALD ART Counsel, Social Benefits Tribunal $2000.00 ;;;; run; proc print; run;

On 4/3/07, David L Cassell <davidlcassell@msn.com> wrote: > datanull@GMAIL.COM sagely replied: > > > >On 4/2/07, RolandRB <rolandberry@hotmail.com> wrote: > >>Try /[A-Z][A-Z]*/ > > > >Almost. "*" means zero or more occurrences, therefore returning 1 for > >all records in the example data. > > > >But "+" means one or more. > > > >i = prxmatch('/[A-Z][A-Z]+/',_infile_); > > > >works for the example data. > > Well, for the example data, all you need is 2 consecutive capitals, > so you could use: > > i = prxmatch('/[A-Z][A-Z]/',_infile_); > > or > > i = prxmatch('/[A-Z]{2}/',_infile_); > > Both match 2 conscutive caps. > > To make sure we get the first fully capitalized name, we could do this: > > i = prxmatch('/\b[A-Z]+\b/',_infile_); > > That insists on starting at a 'word boundary', then one or more > capitals, and not matching unless the word is all caps. The second > \b means that the match has to include the 'word' ending too. > > This still fails as soon as one of the businesses has a string of caps > in it as a single 'word', like the first word in 'SAS Institute'. > > > I may not be a REGEXpert, but I am a REGEXspurt. :-) > > David > -- > David L. Cassell > mathematical statistician > Design Pathways > 3115 NW Norwood Pl. > Corvallis OR 97330 > > _________________________________________________________________ > The average US Credit Score is 675. The cost to see yours: $0 by Experian. > http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE >


Back to: Top of message | Previous page | Main SAS-L page