Date: Wed, 26 Sep 2001 10:02:59 -0400
Reply-To: "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Subject: Re: Speed of =: vs. SUBSTR()...
Content-Type: text/plain
Laurel,
This is a though one to give a set answer to because the SQL process
optimization keeps improving in each release.
I think the only way you could answer it is to run a test for both
approaches.
Off hand, I'd say this looks about as good as you're going to get since the
problem is pretty straightforward.
fwiw,
Dennis Diskin
> -----Original Message-----
> From: Laurel Copeland [SMTP:lacop@UMICH.EDU]
> Sent: Wednesday, September 26, 2001 9:49 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Speed of =: vs. SUBSTR()...
>
> What would be fastest for a set of substring values? E.g., still with the
> 5-character diagnostic codes, I want to pull off cases with any diagnosis
> beginning with, say, 290, 295, 297, or 298. Running through 10 fields on
> a tape containing a few million records, I could use (SAS 6.09, MVS):
>
> Proc SQL;
> CREATE table work.REG88a
> AS SELECT PM.SCRSSN,
> PM.DXLSF,PM.DXF2,PM.DXF3,PM.DXF4,PM.DXF5,
> PM.DXF6,PM.DXF7,PM.DXF8,PM.DXF9,PM.DXF10
> FROM pm88.pm88 AS pm
> WHERE substr(PM.DXLSF,1,3) IN('290','295','297','298')
> OR substr(PM.DXF2,1,3) IN('290','295','297','298')
> OR substr(PM.DXF3,1,3) IN('290','295','297','298')
> OR substr(PM.DXF4,1,3) IN('290','295','297','298')
> OR substr(PM.DXF5,1,3) IN('290','295','297','298')
> OR substr(PM.DXF6,1,3) IN('290','295','297','298')
> OR substr(PM.DXF7,1,3) IN('290','295','297','298')
> OR substr(PM.DXF8,1,3) IN('290','295','297','298')
> OR substr(PM.DXF9,1,3) IN('290','295','297','298')
> OR substr(PM.DXF10,1,3) IN('290','295','297','298');
>
> Is this my best bet?
> Laurel Copeland
>
> At 09:24 AM 9/26/01 -0400, kviel wrote:
>
>
> Wim,
>
> I apologize. I would like to modify my previous suggestion.
> If,
> indeed, we are
> interested in this effect:
>
> if substr(c(i),1,1)='8' then d=1;
>
> Then this should really interest us:
>
> 87 data _null_;
> 88 z01='80000';
> 89 do i=1 to 1e7;
> 90 if substr(z01,1,1)='8' then d=1;
> 91 end;
> 92 run;
>
> NOTE: DATA statement used:
> real time 3.98 seconds
> cpu time 3.98 seconds
>
>
> 93
> 94 data _null_;
> 95 z01='80000';
> 96 do i=1 to 1e7;
> 97 if z01=:'8' then d=1;
> 98 end;
> 99 run;
>
> NOTE: DATA statement used:
> real time 0.27 seconds
> cpu time 0.27 seconds
>
> In other words, on a Windows NT machine running v8.1, the substr()
> function
> was not as fast as the colon.
>
|