Date: Sun, 4 Feb 2007 22:24:25 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: sql question
In-Reply-To: <1115a2b00702011504w7e348943j73d784774ea6460d@mail.gmail.com>
Content-Type: text/plain; format=flowed
liuwensui@GMAIL.COM replied:
>
>Jun,
>If possible, use hash. It is much faster.
>Anyway, here is a piece of crap from my blog:
>
>January 29
>METHODS TO DO A TABLE LOOKUP AND THE SPEEDS COMPARISON
>
>/**********************************************************
>* METHODS TO DO A TABLE LOOKUP AND THE SPEEDS COMPARISON *
>**********************************************************/
>
>* SIMULATE LONG AND SHORT TABLES ;
>data long;
> do i = 1 to 1000000;
> x = rannor(1);
> output;
> end;
>run;
>
>data short;
> do i = 1 to 1000000 by 100000;
> y = rannor(1);
> output;
> end;
>run;
>
>* METHOD 1: PROC SORT AND MERGE ;
>* CPU TIME: 1.51 SECONDS ;
>proc sort data = long;
> by i;
>run;
>
>proc sort data = short;
> by i;
>run;
>
>data merge1;
> merge long short;
> by i;
>run;
>
>* METHOD 2: LEFT JOIN IN PROC SQL ;
>* CPU TIME: 1.42 SECONDS ;
>proc sql noprint;
>create table
> merge2 as
>select
> long.*,
> short.y
>from
> long as long left join short as short
>on
> long.i = short.i;
>quit;
>
>* METHOD 3: HASH ;
>* CPU TIME: 0.59 SECONDS ;
>data merge3;
> set long;
>
> if _N_ = 1 then do;
> declare hash h (dataset: 'short', hashexp: 8);
> h.definekey ('i');
> h.definedata ('y');
> h.definedone ();
> end;
>
> if h.find() = 0 then y = y;
>run;
>
>* METHOD 4: UPDATE IN PROC SQL ;
>* CPU TIME: 1.92 SECONDS ;
>proc sql noprint;
>create table
> merge4 as
>select
> *,
> . as y
>from
> long;
>
>update
> merge4 as long
>set
> y = (select y from short where i = long.i)
>where
> long.i in (select i from short);
>quit;
>
>/**********************************************************
>* END OF CODE *
>**********************************************************/
>
>
>On 2/1/07, Jun Wan <junwansas@gmail.com> wrote:
>>hi, all,
>>
>>I have two tables and that need to be join and updated.
>>
>>I am look at something like
>>
>>update a,b set a.column = b.column where a.key = b.key;
>>
>>Did SAS proc sql have something like this?
>>
>>Thank you!
>>
>>Jun
>>
>
>
>--
>WenSui Liu
>A lousy statistician who happens to know a little programming
>(http://spaces.msn.com/statcompute/blog)
Let me just point out that you left out a couple methods.
The format as a lookup table, the array as a lookup table, the
POINT= option, the KEY= option, ...
In particular, using an array to do the key-indexing should be
even faster than the hash. Try it and see. Here's some
***untested*** code to illustrate what I mean:
data merge5;
array yy{1:1000000} _temporary_ ;
do until(eof1);
set short end=eof1;
yy{i} = y;
end;
do until(eof2);
set long end=eof2;
y = yy{i};
end;
run;
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Check out all that glitters with the MSN Entertainment Guide to the Academy
Awards® http://movies.msn.com/movies/oscars2007/?icid=ncoscartagline2
|