Date: Fri, 5 Sep 2003 11:13:54 -0400
Reply-To: rpresley <rpresley@GMCF.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: rpresley <rpresley@GMCF.ORG>
Subject: REPOST: Proc SQL and PROC SORT followed by data step merge prod
uce different data sets -- why?
Content-Type: text/plain; charset="iso-8859-1"
The last post got a bit scrambled.
> SAS-L,
>
> One can join to data sets or tables with SQL view or one can mege the two
> data sets in a data step. If one writes equivalent code then the
> resulting data sets should be identical. I think I have writtten
> equivalent code but the resulting data sets are not the same. Can somone
> provide insight into why the code below is not equivalent?
>
> Thanks for any help.
>
> Rodney
>
>
>
> 187 proc sort data=stroke_2.stroke_main nodupkey ;
> 188 by systemid;
> 189 %runn;
>
> NOTE: SAS sort was used.
> NOTE: 0 observations with duplicate key values were deleted.
> NOTE: There were 930 observations read from the data set
> STROKE_2.STROKE_MAIN.
> NOTE: The data set STROKE_2.STROKE_MAIN has 930 observations and 201
> variables.
> NOTE: Compressing data set STROKE_2.STROKE_MAIN decreased size by 78.52
> percent.
> Compressed is 29 pages; un-compressed would require 135 pages.
> NOTE: PROCEDURE SORT used:
> real time 7.49 seconds
> cpu time 0.16 seconds
>
>
> "This code was run on 09/05/03 at 11:02 by rpresley"
> 190 proc sort data=stroke_2.stroke_dual nodupkey ;
> 191 by systemid;
> 192 %runn;
>
> NOTE: SAS sort was used.
> NOTE: 0 observations with duplicate key values were deleted.
> NOTE: There were 85 observations read from the data set
> STROKE_2.STROKE_DUAL.
> NOTE: The data set STROKE_2.STROKE_DUAL has 85 observations and 341
> variables.
> NOTE: PROCEDURE SORT used:
> real time 1.30 seconds
> cpu time 0.09 seconds
>
>
> "This code was run on 09/05/03 at 11:02 by rpresley"
> 193 data both ;
> 194 merge stroke_2.stroke_main(keep=systemid in=m)
> 195 stroke_2.stroke_dual(keep=systemid fname lname bdate in=d);
> 196 main=m;
> 197 dual=d;
> 198 if m and d then output both;
> 199 %runn;
>
> INFO: The variable SYSTEMID on data set STROKE_2.STROKE_MAIN will be
> overwritten by data set
> STROKE_2.STROKE_DUAL.
> NOTE: There were 930 observations read from the data set
> STROKE_2.STROKE_MAIN.
> NOTE: There were 85 observations read from the data set
> STROKE_2.STROKE_DUAL.
> NOTE: The data set WORK.BOTH has 85 observations and 6 variables.
> NOTE: DATA statement used:
> real time 5.24 seconds
> cpu time 0.14 seconds
>
>
> "This code was run on 09/05/03 at 11:02 by rpresley"
> 200 proc sql ;
> 201 create table str_dual1
> 202 as select *
> 203 from stroke_2.stroke_dual
> 204 order by systemid
> 205 ;
> NOTE: Table WORK.STR_DUAL1 created, with 85 rows and 341 columns.
>
> 206 create table str_man1 /*only select records which were
> re-abstracted*/
>
> 207 as select *
> 208 from stroke_2.stroke_main
> 209 where systemid in (select distinct systemid from
> stroke_2.stroke_dual)
> 210 order by systemid;
> NOTE: Table WORK.STR_MAN1 created, with 70 rows and 201 columns.
>
> 211 quit;
> NOTE: PROCEDURE SQL used:
> real time 4.38 seconds
> cpu time 0.27 seconds
>
>
> 212 %runn;
> "This code was run on 09/05/03 at 11:05 by rpresley"
>
> Rodney J. Presley, PhD
> Director of Data Analysis
> Georgia Medical Care Foundation
> 1455 Lincoln Parkway
> suite 800
> Atlanta, GA 30346
>
> 678-527-3474
> 678-527-3574 fax
>
> rpresley@gmcf.org
>
|