LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 12 May 2010 10:46:01 +0930
Reply-To:     "Barnett, Adrian (DECS)" <Adrian.Barnett2@SA.GOV.AU>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Barnett, Adrian (DECS)" <Adrian.Barnett2@SA.GOV.AU>
Subject:      Re: CPU Specifications used for SPSS
Comments: To: Jon Fry <JonathanFry@us.ibm.com>
In-Reply-To:  <OF51779C30.644DC334-ON86257720.00455786-86257720.004E0E5E@us.ibm.com>
Content-Type: multipart/alternative;

Hi Jonathon My interest in the utilization of the separate cores is that, since the sorts are frequently the most time-consuming parts of what are lengthy programs, I'd like to see them all doing as much useful work as possible. So I'd like to see them all basically flat out all working on the sort, since that should get it done quicker. I'd also like to have to do as little disk swapping as possible, so I'd like it to use all the free memory it can.

At the moment it looks like the available memory and CPU capacity are not being used to the maximum.

Based on what the Task Manager was telling me during a run with WORKSPACE at its default, I could see that there was a gigabyte of memory free, so I re-ran with WORKSPACE at 1000000, which should have made 1 GB available. The resource messages suggested that it was giving 512 MB to each thread. However the results were a bit puzzling, since the same sort went from 8 sorts to 2 (which was encouraging) but the reported Total CPU time went from 56.8 sec to 58.34, and Elapsed from 49.76 to 46.33, which was one step forward and another back. There was also a fair amount of disk I/O.

I understand your point that if the CPU load is light, the separate CPUs will be doing different amounts (since one can look after it all), but sorting 900,000 records on several string variables of between 46 characters and 5 in length should be pretty demanding. So I am puzzled by what I am observing.

Your earlier description of how threads are allocated - dividing a task between cores when the file might be bigger than physical memory, sounds like what it should be doing routinely. The task should always complete quicker if split amongst more processors, especially if the file is smaller than memory. Superficially it looks to me as if splitting amongst processors when there is LESS memory than would fit the file would be the time to not bother with allocating to extra CPUs, since there's nothing for them to do while waiting for the disk, and might even generate more disk I/O due to the overhead.

Thanks for the tip about re-setting WORKSPACE.

Adrian Barnett

Project Officer

Educational Measurement and Analysis

Data and Educational Measurement

DECS

ph 82261080

________________________________ From: Jon Fry [mailto:JonathanFry@us.ibm.com] Sent: Tuesday, 11 May 2010 11:43 PM To: Barnett, Adrian (DECS) Cc: SPSSX-L@LISTSERV.UGA.EDU Subject: RE: CPU Specifications used for SPSS

Adrian,

It would probably help to set WORKSPACE back to some modest value (20000?) after your sorts. It can hurt to keep it too high.

There is no need to be concerned about CPU usage imbalance. The OS dispatches threads on any available processor; it does not try to balance usage. It may even pick the lowest-numbered available one. So only a heavy CPU load will look balanced.

Most sort problems are not CPU-intensive. If you are seeing the memory budgets, you can also see CPU times and elapsed times. I am sure the elapsed times for all sort phases exceed the CPU times even with multiple threads. My suggestions for setting THREADS: if you think your dataset will fit in memory after compression, use one thread. Otherwise, usually use two threads. Only for CPU-intensive sorting problems will more than two threads pay off.

Jonathan

From:

"Barnett, Adrian (DECS)" <Adrian.Barnett2@sa.gov.au>

To:

Jon Fry/Chicago/IBM@IBMUS

Cc:

"SPSSX-L@LISTSERV.UGA.EDU" <SPSSX-L@LISTSERV.UGA.EDU>

Date:

05/10/2010 08:53 PM

Subject:

RE: CPU Specifications used for SPSS

________________________________

Hi Jonathon Thanks for the tip about WORKSPACE and THREADS.

I didn't realise it was possible to use it to improve memory usage because the manual says "don't do it unless SPSS complains".

Below is all that is said about the use of WORKSPACE. WORKSPACE allocates more memory for some procedures when you receive a message indicating that the available memory has been used up or indicating that only a given number of variables can be processed. MXCELLS increases the maximum number of cells you can create for a new pivot table when you receive a warning that a pivot table cannot be created because it exceeds the maximum number of cells that are allowed. * WORKSPACE allocates workspace memory in kilobytes for some procedures that allocate only one block of memory. The default is 6148. * Do not increase the workspace memory allocation unless the program issues a message that there is not enough memory to complete a procedure

The section on THREADS discourages you from altering the setting.

I will experiment with both of these and see if anything improves.

I must say I didn't observe an even allocation of work across cores (4 real and 4 virtual) when my sort was running. The overwhelming majority of work was being done by a single core. The others were doing stuff, but not much. There is a big sort running on my work computer as I write and one CPU is maxed out while the other is sitting at about 10-15%.

Can CPU utilization be made unbalanced if there is insufficient memory for second (and subsequent) cores to do anything useful?

Adrian Barnett Project Officer Educational Measurement and Analysis Data and Educational Measurement DECS ph 82261080 ________________________________ From: Jon Fry [mailto:JonathanFry@us.ibm.com] Sent: Tuesday, 11 May 2010 10:52 AM To: Barnett, Adrian (DECS) Cc: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: CPU Specifications used for SPSS

Regarding SORT CASES:

SORT CASES can use more memory than it was using here. 32-bit versions use at least 128MB; 64-bit versions use at least 512MB. If WORKSPACE is set higher, it will use the WORKSPACE setting. If the available memory (the result of the preceding calculation) is enough to store the entire dataset, it will sort the data in memory.

If the file might be bigger than the available memory, SORT CASES divides the work among a set of threads so it can make use of multiple cores. It first divides the memory. On Adrian's four core processor, it divided the 512MB available into four 128MB areas (about 131,000 KB) and gave one area to each thread. The number of threads it uses is controlled by SET THREADS.

Jonathan Fry


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page