Date: Thu, 11 Feb 2010 17:44:59 -0000
Reply-To: Garry Gelade <garry@business-analytic.co.uk>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Garry Gelade <garry@business-analytic.co.uk>
Subject: Re: Speeding up aggregate
In-Reply-To: <71404CA09AF74D46B59EEB2A2752DB831B1A92@VS-SBS2003.ASPlease.local>
Content-Type: multipart/alternative;
Derek
Your aggregate command isn't using the presorted option. I'd be inclined to
try presorting and then aggregating to a new dataset or to a temporary file.
You could then merge the counts back to your original file if you need the
variable on the original data.
Garry Gelade
Business Analytic Ltd.
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Derek Willemsen
Sent: 11 February 2010 16:18
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: Speeding up aggregate
Hi Mike,
Thanks for your reply. I've tested it and it's still getting slow after 9-10
million records.. The first 9 million were processed in a few seconds so I
was hopeful, but after a while it slowed down and it was processing a couple
of hundred records a second (instead of thousands/millions).
Greetings,
Derek
_____
Van: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] Namens Mike Palij
Verzonden: donderdag 11 februari 2010 16:53
Aan: SPSSX-L@LISTSERV.UGA.EDU
Onderwerp: Re: Speeding up aggregate
Does it still take that long if you use a file that only has the four break
variables and perhaps a case ID?
-Mike Palij
New York University
mp26@nyu.edu
----- Original Message -----
From: Derek <mailto:DerekWillemsen@invicta.nl> Willemsen
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Thursday, February 11, 2010 10:31 AM
Subject: Speeding up aggregate
Dear all,
I have a dataset which contains 16 million records. I need to count how many
records there are on 4 break variables so I use an simple aggregate with
ADDVARIABLES mode.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=VAR1 VAR2 VAR3 VAR4
/N_BREAK=N.
The first couple of million goes fast, but after 11 million records the
aggregation is getting really slow. It takes ages to finish the last 5
million records. Normally it takes about 2,5 hours to finish the operation.
Is there a way speed this process up?
(I have 2GB internal memory)
Thanks in advance!
Derek Willemsen
__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4858 (20100211) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
[text/html]