Date: Fri, 26 Jan 2007 10:47:25 -0500
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: PRX compared with older functions
Alan,
Thanks for taking the time to come up with a working pure prx solution.
However, as is, it takes much longer than Toby's mixed solution.
To level the playing field, I eliminated your macro calls for year and
day, and increased the number of records to 6 million.
The results were:
Pure PRX: real time: 2:17.64 cpu time: 36.95 seconds
Mixed: real time 13.78 seconds cpu time: 13.78 seconds
Art
---------
On Thu, 25 Jan 2007 22:16:05 -0700, Alan Churchill <SASL001@SAVIAN.NET>
wrote:
>%macro parse (abbr,unit) ;
> re = prxparse("/(\d{1,2})(?:&abbr)/o") ;
> if prxmatch(re,var) then
> &unit = prxposn(re,1,var);
>%mend ;
>
>data test (drop=re);
> input var $11. ;
> %parse(y,year) ;
> %parse(d,day) ;
> %parse(h,hour) ;
> %parse(m,min) ;
> %parse(s,sec) ;
> datalines;
>w2y
>w34y4h
>w38h45m23s
>p5d23m
>p61h
>w56s
>;
>run;
>
>
>Alan Churchill
>Savian "Bridging SAS and Microsoft Technologies"
>www.savian.net
>
>
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Arthur
>Tabachneck
>Sent: Thursday, January 25, 2007 8:56 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: PRX compared with older functions
>
>Last night Toby offered a partial prx solution to a scanning need that was
>raised, namely to find the number(s) that followed the characters h, m and
>s in a given set of text.
>
>data test;
> input var $11. ;
> datalines;
>w2y
>w34y4h
>w38h45m23s
>p5d23m
>p61h
>w56s
>;
>run;
>
>Data Need ( Keep = Var Sec Min Hour ) ;
> Set Test ;
> Retain Pattern ;
> If _N_ = 1 Then Pattern = PRXParse( "/\d+[smh]/" ) ;
> Start = 1 ;
> Stop = Length( Var ) ;
> Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ;
> Do I = 1 By 1 While( ( Match > 0 ) and ( I < Stop ) ) ;
> Temp = Substr( Var , Match , Length ) ;
> If Index( Temp , 's' ) Then
> Sec = Input( Compress( Temp , 's' ) , 8. ) ;
> Else If Index( Temp , 'm' ) Then
> Min = Input( Compress( Temp , 'm' ) , 8. ) ;
> Else If Index( Temp , 'h' ) Then
> Hour = Input( Compress( Temp , 'h' ) , 8. ) ;
> Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ;
> End ;
>Run ;
>
>A pure prx solution and possible refinement were then offered, but one
>didn't work and the other took twice as much time to complete then Toby's
>mixed solution.
>
>Can anyone offer a pure prx solution that runs faster than Toby's mixed
>offering?
>
>Art
|