LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 9 Feb 2011 06:15:15 -0500
Reply-To:     Søren Lassen <s.lassen@POST.TELE.DK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Søren Lassen <s.lassen@POST.TELE.DK>
Subject:      Re: Strip Function And Leading and Trailing Tab Chars
Comments: To: Chang Chung <chang_y_chung@HOTMAIL.COM>
Content-Type: text/plain; charset=ISO-8859-1

Chang (and Toby)

I modified your test a little bit, and got his result:

361 sasfile work.one.data open; NOTE: The file WORK.ONE.DATA has been opened by the SASFILE statement. 362 data chang; 363 set one; 364 retain prxid; 365 if _N_=1 then prxid=prxparse('s/^\s+|\s+$//'); 366 drop prxid; 367 call prxchange(prxid, -1, s); 368 run;

NOTE: There were 1000000 observations read from the data set WORK.ONE. NOTE: The data set WORK.CHANG has 1000000 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 40.29 seconds cpu time 36.51 seconds

369 data soren; 370 set one; 371 retain prxid; 372 if _N_=1 then prxid=prxparse('s/\s*(\S[\s\S]*\S)\s*/$1/'); 373 drop prxid; 374 call prxchange(prxid,1, s); 375 run;

NOTE: There were 1000000 observations read from the data set WORK.ONE. NOTE: The data set WORK.SOREN has 1000000 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 11.53 seconds cpu time 2.81 seconds

376 data soren2; 377 set one; 378 retain prxid; 379 if _N_=1 then prxid=prxparse('s/\s/ /'); 380 drop prxid; 381 call prxchange(prxid,-1, s); 382 s=strip(s); 383 run;

NOTE: There were 1000000 observations read from the data set WORK.ONE. NOTE: The data set WORK.SOREN2 has 1000000 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 23.85 seconds cpu time 23.43 seconds

384 data toby; 385 set one; 386 retain prxid1 prxid2; 387 if _N_=1 then do; 388 prxid1=prxparse('s/^\s+//'); 389 prxid2=prxparse('s/\s+$//'); 390 end; 391 drop prx:; 392 call prxchange(prxid1 , 1, s); 393 call prxchange(prxid2 , 1, s); 394 run;

NOTE: There were 1000000 observations read from the data set WORK.ONE. NOTE: The data set WORK.TOBY has 1000000 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 8.12 seconds cpu time 6.23 seconds

I changed the prxchange function call to a routine call, because that is always faster. If you want to optimize PRXCHANGE, that is the first thing to do.

Then I tried a Pearl Regular Expression of my own: 's/\s*(\S[\s\S]*\S)\s*/$1/' (what a lot of s-es!) - which works because [\s\S]*\S is greedy, going all the way to the very last non-whitespace (I use [\s\S] instead of . because . does not include newline characters) - this expression can strip the whole shebang (and nothing but the shebang) in one fell swoop - and lo and behold, it was fastest of all - at least in terms of CPU-time, in elapsed time Toby's solution was consistently faster than mine - by a varying amount.

I also tried changing all whitespace to space and then using the STRIP function - that was a lot slower, as you had to do more than one substitution. What is interesting is that when I commented out the CALL PRXCHANGE line in the SOREN2 datastep, it used only about 1.20 CPU-sec. - but the elapsed time was still 20 seconds. Probably because while not CPU-intensive, it still takes time to move the string to the function parameter stack and then back.

I am not quite sure what to make of the (marginal) differences between Toby's method and mine, but I think it is most often possible to accomplish what you want with a single PRXCHANGE call, if you are careful about how you write your expression.

Regards, Søren.

On Tue, 8 Feb 2011 16:48:02 -0500, Chang Chung <chang_y_chung@HOTMAIL.COM> wrote:

>On Tue, 8 Feb 2011 16:22:21 -0500, Chang Chung <chang_y_chung@HOTMAIL.COM> >wrote: >>Below is my >>test code. In terms of real time, I get wildly different results each time >>I run it, despite of the sasfile'ing. But I noticed that the cpu time is >>always about twice as longer for data step toby than chang. >... >Hi, Toby, >well... please forget about my previous posting. I've made a mistake of >generating all blank test data. Once I fixed it, your two prxchange call >solution ran much faster consistently... below is the fixed code. >Cheers, >Chang > >%let seed=12346789; > >data one; > length s $200; > do i = 1 to 1e6; > do j = 1 to 200; > if ranuni(&seed) < 0.15 then substr(s,j,1) = " "; > else substr(s,j,1) = "x"; > end; > output; > end; > keep s; >run; > >sasfile work.one.data open; > data chang; > set one; > s2 = prxchange('s/^\s+|\s+$//o', -1, s); > keep s s2; > run; > data toby; > set one; > s3 = prxchange('s/^\s+//o', 1, s); > s3 = prxchange('s/\s+$//o', 1, s3); > keep s s3; > run; > >sasfile work.one.data close; > >proc compare data=toby compare=chang(rename=(s2=s3)); >run; >/* on log > NOTE: No unequal values were found. All values compared are exactly equal. >*/


Back to: Top of message | Previous page | Main SAS-L page