Date: Fri, 10 Dec 2004 13:00:23 +0100
Reply-To: Spousta Jan <JSpousta@CSAS.CZ>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Spousta Jan <JSpousta@CSAS.CZ>
Subject: Re: SAS faster than SPSS?
Content-Type: text/plain; charset="iso-8859-2"
Dear SAS and SPSS fans,
After I posted my comparison of the two mighty packages on my humble computer, I obtained a series of suggestions about which procedures and settings should be tested further. I am sorry because I am not able to test it all, but I tried at least two things.
3) Computing additional variables (Richard Ristow - I selected only the most effective ways known to me in both packages)
+ crosstabelate them (Art Kendall - I reduced the task in order to be able to run it here; both SAS and SPSS have problems if the output tables are too big).
get file = myfile.
compute s1 = sqrt(v1**2 + v2**2) < 10.
compute s2 = sqrt(v2**2 + v3**2) < 10.
compute s3 = sqrt(v3**2 + v4**2) < 10.
compute s4 = sqrt(v4**2 + v5**2) < 10.
compute s5 = sqrt(v5**2 + v1**2) < 10.
crosstabs /tables=s1 by s2 by s3 by s4 by s5.
data lib.view_of_file /view=lib.view_of_file;
s1 = sqrt(v1**2 + v2**2) < 10;
s2 = sqrt(v2**2 + v3**2) < 10;
s3 = sqrt(v3**2 + v4**2) < 10;
s4 = sqrt(v4**2 + v5**2) < 10;
s5 = sqrt(v5**2 + v1**2) < 10;
proc freq data=lib.view_of_file;
The times are (minutes:seconds):
SPSS compressed: 11:12
SPSS uncompressed: 10:41
SAS compressed with the binary method: 3:29
SAS compressed with the default method: 6:53
SAS uncompressed: 4:11
So SAS is two or three times faster here.
4) Sorting files (Simon). Because the theoretical time for the task is O(n ln(n)), if I remember it right, I decided to use smaller files based on the real data I work with (data about bank clients). The files have 1,000,000 rows and 23 columns, both strings and numerical.
Size of files:
SPSS uncompressed: 273 MB
SPSS compressed: 165 MB
SAS uncompressed: 205 MB
SAS default compression: 111 MB
SAS binary compression: 119 MB
- so here is the compression better for SAS.
The syntax in SPSS:
get file = myfile.
SORT CASES BY var .
proc sort data=lib.myfile out=lib.newfile;
4a) Sorting by a numerical variable
SPSS uncompressed: 0:25
SPSS compressed: 0:25
SAS uncompressed: 0:22
SAS default compression: 0:20
SAS binary compression: 0:39
4b) Sorting by a string variable
SPSS uncompressed: 0:42
SPSS compressed: 0:43
SAS uncompressed: 0:30
SAS default compression: 0:22
SAS binary compression: 0:25
So SAS seems to be more efficient here, too. The differences on the tiny files seem to be neglectable, but they can be very important if the files were really huge, say hundreds of millions of rows, I think.
It is true, that the local SAS office payed three _very good_ drinks for me during a party for SAS customers this week, but despite the lack of parties for SPSS users here I am still trying to be objective. :-)
Wish you all a nice weekend.