LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2004)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 13 May 2004 15:46:31 +1000
Reply-To:     Frank Milthorpe <>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Frank Milthorpe <>
Subject:      Still confused on what the CACHE command actually does and how it
Content-Type: text/plain; charset=US-ASCII

I am still confused as how the CACHE command actually works. I am not sure that the Help description listed below by Richard Oliver is necessarily correct.

I generally get my data using a GET CAPTURE command to get data from Oracle using ODBC. I then do some transformations of the data. It is my understanding that there are separate temporary copies of the active file. It would seem that there would have to be at least two copies; the current file and the new version that is being created. I seem to remember that the total disk space required is actually 3n (where n is the size of the current file), so maybe there is a third copy.

From a literal reading of the Help system information (reproduced below) it would suggest that SPSS is always going back and re-executing the SQL command. Clearly this is the not the case once some transformations have made to the file. So what is the CACHE command doing? Is the CACHE command creating a temporary copy in memory rather than writing it to disk in a temporary file? What happens once transformations are made? What happens if the file is too big too fit in memory?

I would welcome suggestions on how to make processing of large files more efficient. I, probably like many other many others have relatively well specified machines, in my case with 1G of memory.


Frank Milthorpe

>> "Oliver, Richard" <> 13/05/2004 1:26 am >>> Oh, yes, absolutely. CACHE can definitely improve performance when working with data from a database source. From the help system:

Creating a Data Cache Although the virtual active file can vastly reduce the amount of temporary disk space required, the absence of a temporary copy of the "active" file means that the original data source has to be reread for each procedure. For large data files read from an external source, creating a temporary copy of the data may improve performance. For example, for data tables read from a database source, the SQL query that reads the information from the database must be reexecuted for any command or procedure that needs to read the data. Since virtually all statistical analysis procedures and charting procedures need to read the data, the SQL query is reexecuted for each procedure you run, which can result in a significant increase in processing time if you run a large number of procedures.

If you have sufficient disk space on the computer performing the analysis (either your local computer or a remote server), you can eliminate multiple SQL queries and improve processing time by creating a data cache of the active file. The data cache is a temporary copy of the complete data.

Note: By default, the Database Wizard automatically creates a data cache, but if you use the GET DATA command in command syntax to read a database, a data cache is not automatically created.

-------------------------------------------- Frank Milthorpe Senior Manager, Transport Modelling Transport and Population Data Centre (TPDC) Department of Infrastructure, Planning and Natural Resources GPO Box 3927, Sydney NSW 2001 Level 5, 20 Lee Street, Sydney

Direct: +61 2 9762 8488 Tel: +61 2 9762 8511 Fax: +61 2 9762 8514 Email:

Back to: Top of message | Previous page | Main SPSSX-L page