LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 1996)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 15 Jul 1996 15:21:16 GMT
Reply-To:   braner@walden.snr.uvm.edu
Sender:   "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From:   Moshe Braner <braner@EMBA.UVM.EDU>
Organization:   EMBA Computer Facility, The University of Vermont
Subject:   Re: Large memory use in SORT and a few other gripes

Adrian Barnett (adrianb@dove.mtx.net.au) wrote: : The file is a plain ASCII raw data file of 2.4meg - not a compressed : system file so it doesn't 'grow'. : ... : The file to be sorted contains 14,183 cases of 1,664 bytes each. : 14,614,592 bytes of memory are available to the sort. : 18,008 bytes is the minimum in which the sort will run. : 26,730,568 bytes would suffice for an in-memory sort.

First of all, I'd say that sorting such a large file in 2 minutes on a PC is a feat that just a few years ago would have been unthinkable. So the glass is half full...

The file _did_ grow: 14,183 cases of 1,664 bytes each is 23,600,512 bytes. The amount of space that SPSS said "would suffice for an in-memory sort" is only slightly higher than that.

How did a 2.4-meg raw ASCII file turn into a 23-meg SPSS file? My guess is that the file has a lot of variables that are small integers, kept in 1 or 2 digits in the ASCII file. But inside SPSS, all numeric variables are stored as 8-byte floating point numbers. (In "compressed" disk files they are made somewhat smaller, but for an "in-memory" sort the whole 8 bytes are needed for each number.) In theory, SPSS could gain efficiency (in time and space) by having an integer data type distinct from floating point. As it is, one way to sort this file faster and in less space is to leave those many integer variables as one long string variable for the purpose of the sort. E.g., suppose that the raw file has lines like this:

KEY V1 V2 V3 ... 876 1 34 2 ...

and we want to sort by the KEY. If we read the ASCII file into SPSS parsing it into the variables KEY, V1, V2, V3, ... it becomes large. We can read it like this instead:

data list ... /KEY 1-3 REST 5-80 (A).

then sort by KEY, and _then_ parse the REST. Can parse it via a bunch of COMPUTE ... = SUBSTR(...) commands, or, can WRITE the sorted file into a new ASCII file (still relatively small) and then read it in with the DATA LIST command originally intended.

-- Moshe Braner <Moshe.Braner@uvm.edu> 47 McGee Road, Essex Junction, VT 05452 USA


Back to: Top of message | Previous page | Main SPSSX-L page