Date: Wed, 6 Jul 2005 16:17:40 -0400
Reply-To: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Subject: Re: delete duplicate string within a single variable
Generalissimo Sig,
In my tests, not quite really... even when I have eliminated the
extraneous put(i,1.), and after killing the I/O (_null_) (it is much
easier to measure hairs cut from an elephant's posterior in a lab than
right whence they have come):
data _null_ ;
input a: $char20. ;
x = a ;
do j = 1 to 1e6 ;
x = '' ;
do c = '0','1','2','3','4','5','6','7','8','9';
if indexc (a, c) then x = trimn (x) || c ;
end;
end;
cards;
000000000
000110102
420333000
9800019800001987
9a8a7a6vvvvvvv0000
;
run;
NOTE: DATA statement used (Total process time):
real time 18.23 seconds
cpu time 18.15 seconds
data _null_ ;
input a: $char20. ;
do j= 1 to 1e6 ;
x = compress ('0123456789', translate ('0123456789', '', a)) ;
end;
cards;
000000000
000110102
420333000
9800019800001987
9a8a7a6vvvvvvv0000
;
run ;
NOTE: DATA statement used (Total process time):
real time 5.89 seconds
cpu time 5.89 seconds
That's on a XP desktop running 9.1.3. Under AIX (same SAS), the ratio was
23:5.
Kind regards
----------------
Paul M. Dorfman
Jacksonville, FL
----------------
On Wed, 6 Jul 2005 15:08:46 -0400, Sigurd Hermansen <HERMANS1@WESTAT.COM>
wrote:
>Not so fast, Marshals Dorfman and Schreier....
>
>A variant of Richard of Venice's solution combined with Toby of Texas'
>string list,
>
>data test1 (keep=a x);
> retain _digits '0123456789';
> input a: $char20. ;
> do i=1 to 1000000;
> x = compress(_digits,translate(_digits,' ',a));
> output;
> end;
>cards;
>000000000
>000110102
>420333000
>9800019800001987
>9a8a7a6vvvvvvv0000
>;
>run;
>data test2 (keep=a x);
> length x $ 10;
> input a: $char20. ;
> do j=1 to 1000000;
> do i = '0','1','2','3','4','5','6','7','8','9';
> if index(a,put(i,1.)) then x=trim(x)||i;
> end;
> output;
> x='';
> end;
>cards;
>000000000
>000110102
>420333000
>9800019800001987
>9a8a7a6vvvvvvv0000
>;
>run;
>
>
>competes in my tests on even terms with Guido's clever use of string
>functions. It also transports fairly easily to other programming
>environments as well as to other computing platforms.
>Sig
>
>
>-----Original Message-----
>From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
>On Behalf Of Howard Schreier <hs AT dc-sug DOT org>
>Sent: Wednesday, July 06, 2005 12:21 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: delete duplicate string within a single variable
>
>
>No loop, no wallpaper, works in V. 8.
>
>I think we have a winner here.
>
>The string of all digits does not have to be stored in a variable, so it
>can even be adapted for SQL, thusly:
>
> select compress('0123456789',translate('0123456789',' ',a)) as x
> from xx;
>
>On Wed, 6 Jul 2005 07:54:52 +0000, Guido T <cymraeg_erict@HOTMAIL.COM>
>wrote:
>
>>I was a bit too quick with my initial "solution". The middle compress
>isn't
>>needed, compressing out extra spaces isn't a problem. Also TRANSLATE
>>only needs a single space to translate to. How many years have I been
>>using TRANSLATE function? Sigh...
>>
>>What it does is compress out of the _DIGITS string (containing the
>>valid character, in the correct order) the characters from *A* that
>>aren't in the _DIGITS string.
>>
>>++ Guido
>>
>>295 data test(drop=_digits);
>>296 set xx;
>>297 retain _digits '0123456789';
>>298 x = compress(_digits,translate(_digits,' ',a));
>>299 put a= x=;
>>300 run;
>>
>>a=000000000 x=0
>>a=000110102 x=012
>>a=420333000 x=0234
>>a=9800019800001987 x=01789
>>a=9a8a7a6vvvvvvv0000 x=06789
>>
>>NOTE: There were 5 observations read from the data set WORK.XX.
>>NOTE: The data set WORK.TEST has 5 observations and 3 variables.
>>NOTE: DATA statement used:
>> real time 0.01 seconds
>> cpu time 0.01 seconds
|