LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2010)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Feb 2010 17:44:59 -0000
Reply-To:     Garry Gelade <garry@business-analytic.co.uk>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Garry Gelade <garry@business-analytic.co.uk>
Subject:      Re: Speeding up aggregate
Comments: To: Derek Willemsen <DerekWillemsen@invicta.nl>
In-Reply-To:  <71404CA09AF74D46B59EEB2A2752DB831B1A92@VS-SBS2003.ASPlease.local>
Content-Type: multipart/alternative;

Derek

Your aggregate command isn't using the presorted option. I'd be inclined to try presorting and then aggregating to a new dataset or to a temporary file. You could then merge the counts back to your original file if you need the variable on the original data.

Garry Gelade

Business Analytic Ltd.

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Derek Willemsen Sent: 11 February 2010 16:18 To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Speeding up aggregate

Hi Mike,

Thanks for your reply. I've tested it and it's still getting slow after 9-10 million records.. The first 9 million were processed in a few seconds so I was hopeful, but after a while it slowed down and it was processing a couple of hundred records a second (instead of thousands/millions).

Greetings, Derek

_____

Van: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] Namens Mike Palij Verzonden: donderdag 11 februari 2010 16:53 Aan: SPSSX-L@LISTSERV.UGA.EDU Onderwerp: Re: Speeding up aggregate

Does it still take that long if you use a file that only has the four break

variables and perhaps a case ID?

-Mike Palij

New York University

mp26@nyu.edu

----- Original Message -----

From: Derek <mailto:DerekWillemsen@invicta.nl> Willemsen

To: SPSSX-L@LISTSERV.UGA.EDU

Sent: Thursday, February 11, 2010 10:31 AM

Subject: Speeding up aggregate

Dear all,

I have a dataset which contains 16 million records. I need to count how many records there are on 4 break variables so I use an simple aggregate with ADDVARIABLES mode.

AGGREGATE

/OUTFILE=* MODE=ADDVARIABLES

/BREAK=VAR1 VAR2 VAR3 VAR4

/N_BREAK=N.

The first couple of million goes fast, but after 11 million records the aggregation is getting really slow. It takes ages to finish the last 5 million records. Normally it takes about 2,5 hours to finish the operation.

Is there a way speed this process up?

(I have 2GB internal memory)

Thanks in advance!

Derek Willemsen

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4858 (20100211) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page