|
I finally got to your suggested REGEX, well almost, yesterday afternoon.
I had was using
regex = prxparse('s/(\b[A-Z][A-Z]+\b)/ $1/');
with PRXCHANGE. I had some truncations problems with this when
reading from cards and when omitting argument 4 from PRXCHANGE. I
finally got to here, code below, but I don't really understand the
truncation problem. I ended up including argument 4 and then
assigning that variable to _INFILE_ after calling PRXCHANGE.
filename ft15f001 temp lrecl=256 recfm=v;
data salaries;
array prx[1] _temporary_;
if _n_ eq 1 then prx[1] = prxparse('s/(\b[A-Z]+\b)/ $1/');
array infile[1] $32767 _temporary_;
infile ft15f001 stopover eof=eof;
input @;
call prxchange(prx[1],1,_infile_,infile[1]);
_infile_ = infile[1];
input @1 agency &$50. lastnm:$20. firstnm:$20. jobtitle&$50. sal:dollar16.;
list;
return;
eof: call prxfree(prx[1]);
stop;
parmcards4;
Agricorp THOMSON TOM Director, Corporate Services $100,000.00
Alcohol Commission BEETHOVEN LOU Manager, Network Services $100,000.00
Smart Systems for Health Agency MATISSE HENRY Director, Risk
Management $150,000.00
Social Benefits Tribunal BUCHWALD ART Counsel, Social Benefits
Tribunal $2000.00
;;;;
run;
proc print;
run;
On 4/3/07, David L Cassell <davidlcassell@msn.com> wrote:
> datanull@GMAIL.COM sagely replied:
> >
> >On 4/2/07, RolandRB <rolandberry@hotmail.com> wrote:
> >>Try /[A-Z][A-Z]*/
> >
> >Almost. "*" means zero or more occurrences, therefore returning 1 for
> >all records in the example data.
> >
> >But "+" means one or more.
> >
> >i = prxmatch('/[A-Z][A-Z]+/',_infile_);
> >
> >works for the example data.
>
> Well, for the example data, all you need is 2 consecutive capitals,
> so you could use:
>
> i = prxmatch('/[A-Z][A-Z]/',_infile_);
>
> or
>
> i = prxmatch('/[A-Z]{2}/',_infile_);
>
> Both match 2 conscutive caps.
>
> To make sure we get the first fully capitalized name, we could do this:
>
> i = prxmatch('/\b[A-Z]+\b/',_infile_);
>
> That insists on starting at a 'word boundary', then one or more
> capitals, and not matching unless the word is all caps. The second
> \b means that the match has to include the 'word' ending too.
>
> This still fails as soon as one of the businesses has a string of caps
> in it as a single 'word', like the first word in 'SAS Institute'.
>
>
> I may not be a REGEXpert, but I am a REGEXspurt. :-)
>
> David
> --
> David L. Cassell
> mathematical statistician
> Design Pathways
> 3115 NW Norwood Pl.
> Corvallis OR 97330
>
> _________________________________________________________________
> The average US Credit Score is 675. The cost to see yours: $0 by Experian.
> http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE
>
|