Date: Wed, 8 Sep 2010 14:50:22 -0700
Reply-To: Justin Carroll <jrc.csus@GMAIL.COM>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Justin Carroll <jrc.csus@GMAIL.COM>
Subject: Re: Large Data Files
In-Reply-To: <97D6F0A82A6E894DAF44B9F575305CC90F05790C@HCAMAIL03.ochca.com>
Content-Type: multipart/alternative;
I haven't looked into it too much, but I would imagine that a RAID-0 setup
with faster RPM's HD's (faster read/write) or 'stroked HDs' (faster seek
time I believe) would also increase 'speed'.
Question: Also does anyone know if "Set Workspace" increases performance
(I know the help files say to only use when SPSS says it's out of memory,
and it only works for 'certain procedures').
*Also I've read somewhere that SPSS single-client version can only utilize a
single 'core' of a multi-cored processor. Meaning, your home/work computer
is probably running at a fraction of the processing speed that it is capable
of. For instance, my home DIY-computer has 6 cores, and SPSS can only
utilize one of them and the other 5 are used by other applications. Whereas
the SPSS-Server edition can utilize all cores of a processor - dramatically
increasing processing speed.* I am not sure, and was unable to find any
"google derived" evidence to support this claim, but I 100% positive I read
it just a few months back. Can anyone confirm this?
My files are not as large as the ones you guys are using (measured in GB),
but mine range in the 100k's of cases, and 1000+ variables sometimes
(400-600mb in file sizes). I use both SPSS 15 and 17, and both client and
server editions, and I know that if I run a procedure like Crosstabs on the
single-client version it takes about 5-10 min for it to run, whereas if I
run it on the server edition it takes about 30 seconds.
//////
*
Some quick references:*
Forum post on RAIDs: *
http://forums.hexus.net/hexus-hardware/130603-how-much-speed-difference-there-raid-0-a.html
*
Article by SPSS on Hardware recommendations (dated 2008): *
http://www.spss.com/media/collateral/SSSWP-0608.pdf*
Discussion a few months back on this Listserv: *
http://spssx-discussion.1045642.n5.nabble.com/Quad-Core-Processors-td1092004.html
*
//////
HTH,
J. R. Carroll
Grad. Student in Pre-Doc Psychology at CSUS
Research Assistant for Just About Everyone.
Email: jrc.csus@gmail.com -or- jrcarroll@jrcresearch.net
Phone: (916) 628-4204
On Wed, Sep 8, 2010 at 2:14 PM, Pirritano, Matthew <MPirritano@ochca.com>wrote:
> I work with large files > 3 GB, > 4 million lines.
>
>
>
> 1. For big data processing jobs use python without the spss front end.
> Much faster.
> 2. Start with the main file. Eliminate all unnecessary variables and
> cases for each analysis. Or if possible use aggregate to pare down the size
> of the file. The first step or two will take some time, but then the file
> gets smaller and things speed up.
> 3. I’ve not tried this one but have read on the list that 64 bit
> processor with multiple cpu’s and max ram majorly speeds things up.
>
>
>
> Thanks
>
> Matt
>
>
>
> Matthew Pirritano, Ph.D.
>
> Research Analyst IV
>
> Medical Services Initiative (MSI)
>
> Orange County Health Care Agency
>
> (714) 568-5648
> ------------------------------
>
> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] *On Behalf
> Of *Marcos Sanches
> *Sent:* Wednesday, September 08, 2010 2:04 PM
> *To:* SPSSX-L@LISTSERV.UGA.EDU
> *Subject:* Large Data Files
>
>
>
> Hi all,
>
>
>
> I wonder if anybody has any suggestion for working with large data file in
> SPSS. My data has around 10 millions observation and 30 variables and
> everything I do takes a looooooong time...
>
>
>
> Thanks a lot!
>
>
>
> Marcos
>
>
>
>
>
>
>
>
>
[text/html]
|