LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2004)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 10 Dec 2004 13:00:23 +0100
Reply-To:     Spousta Jan <JSpousta@CSAS.CZ>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Spousta Jan <JSpousta@CSAS.CZ>
Subject:      Re: SAS faster than SPSS?
Content-Type: text/plain; charset="iso-8859-2"

Dear SAS and SPSS fans,

After I posted my comparison of the two mighty packages on my humble computer, I obtained a series of suggestions about which procedures and settings should be tested further. I am sorry because I am not able to test it all, but I tried at least two things.

3) Computing additional variables (Richard Ristow - I selected only the most effective ways known to me in both packages) + crosstabelate them (Art Kendall - I reduced the task in order to be able to run it here; both SAS and SPSS have problems if the output tables are too big).

In SPSS: get file = myfile. temporary. compute s1 = sqrt(v1**2 + v2**2) < 10. compute s2 = sqrt(v2**2 + v3**2) < 10. compute s3 = sqrt(v3**2 + v4**2) < 10. compute s4 = sqrt(v4**2 + v5**2) < 10. compute s5 = sqrt(v5**2 + v1**2) < 10. crosstabs /tables=s1 by s2 by s3 by s4 by s5. In SAS: data lib.view_of_file /view=lib.view_of_file; set lib.myfile; s1 = sqrt(v1**2 + v2**2) < 10; s2 = sqrt(v2**2 + v3**2) < 10; s3 = sqrt(v3**2 + v4**2) < 10; s4 = sqrt(v4**2 + v5**2) < 10; s5 = sqrt(v5**2 + v1**2) < 10; run; proc freq data=lib.view_of_file; table s1*s2*s3*s4*s5; run;

The times are (minutes:seconds): SPSS compressed: 11:12 SPSS uncompressed: 10:41 SAS compressed with the binary method: 3:29 SAS compressed with the default method: 6:53 SAS uncompressed: 4:11

So SAS is two or three times faster here.

4) Sorting files (Simon). Because the theoretical time for the task is O(n ln(n)), if I remember it right, I decided to use smaller files based on the real data I work with (data about bank clients). The files have 1,000,000 rows and 23 columns, both strings and numerical.

Size of files: SPSS uncompressed: 273 MB SPSS compressed: 165 MB SAS uncompressed: 205 MB SAS default compression: 111 MB SAS binary compression: 119 MB - so here is the compression better for SAS.

The syntax in SPSS: get file = myfile. SORT CASES BY var .

In SAS: proc sort data=lib.myfile out=lib.newfile; by var; run;

Results: 4a) Sorting by a numerical variable SPSS uncompressed: 0:25 SPSS compressed: 0:25 SAS uncompressed: 0:22 SAS default compression: 0:20 SAS binary compression: 0:39

4b) Sorting by a string variable SPSS uncompressed: 0:42 SPSS compressed: 0:43 SAS uncompressed: 0:30 SAS default compression: 0:22 SAS binary compression: 0:25

So SAS seems to be more efficient here, too. The differences on the tiny files seem to be neglectable, but they can be very important if the files were really huge, say hundreds of millions of rows, I think.

It is true, that the local SAS office payed three _very good_ drinks for me during a party for SAS customers this week, but despite the lack of parties for SPSS users here I am still trying to be objective. :-)

Wish you all a nice weekend.

Jan


Back to: Top of message | Previous page | Main SPSSX-L page