Date: Fri, 10 Dec 2004 12:43:26 -0000
Reply-To: Allan Reese FM CEFAS <r.a.reese@cefas.co.uk>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Allan Reese FM CEFAS <r.a.reese@cefas.co.uk>
Subject: Re: SPSS vs SAS
Content-Type: text/plain; charset="iso-8859-1"
I've been interested in the comparisons but wish to make a very obvious point about the benchmark tests:
Extracts from Spousta Jan <JSpousta@CSAS.CZ> Thu, 9 Dec 2004 14:31:40 +0100
----------------------------------
Subject: Re: SAS faster than SPSS?
[sample job uses data files] all have 100.000.000 rows and 6 numerical columns
The size of the files:
SPSS uncompressed - 4.7 GB SAS uncompressed - 4.8 GB
I did two things with them:
1. Frequency tables, times in minutes:seconds:
SPSS uncompressed - 3:41
SPSS compressed - 4:38
SAS uncompressed - 4:42
SAS compressed - 7:27
2. Merging files, times for the merges:
SPSS - 19:05
SAS - 15:34
Therefore [Jan's] first answer to Benny's question "what factor of performance increase should we expect when going from SPSS to SAS" is: Something between 1.3 and 0.6 (or even more or even less) depending on which work you do.
----------------------------------
My take:
That's ONE HUNDRED MILLION cases, like one for every household in the US. Each data file would occupy about 4000 of the floppy disks we thought were neat last decade, a pile some 12 metres high. OK, it's now 10 CDs or one DVD, but let's keep in perspective that this is a big job. Unless the data come from some automated stream of physical measurements, the data represent a large investment in data collection. Compared with that, you can get a household computer for a few hundred dollars to run these analyses in MINUTES. There's an old Zen joke about a westerner boasting to an oriental that because his new car was faster he could get to work in five minutes less - "So, what do you do in those minutes?" My take on this whole business is that it doesn't matter which you choose, but learn to use it effectively. Do you seriously need to process 100 million cases? SPSS has the feature to select cases at random, OR to select the first N cases, so you can run quick analyses on the first cases in the huge file without the overhead of reading it all. Efficiency is more about getting results you believe, and *I*'ve just wasted half the morning because data had been entered inconsistently into an Excel spreadsheet. Curse me, I had to go and cross-check the results! (Cell left blank when others coded "Missing".)
Allan
***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************
|