Date: Mon, 24 Jan 2005 11:15:52 -0800
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Proc CORR running out of memory
In-Reply-To: <FE56865C459CD6468D6465A15B908FC0052272CC@exchny54.ny.ssmb.com>
Content-type: text/plain; charset=US-ASCII
"Chaudhury, Jayati [IT]" <jayati.chaudhury@CITIGROUP.COM> wrote:
> I am getting an out of memory error when I try to run proc corr on a
dataset
> with 784 observations and 6122 variables. The code is provided below.
>
> Is there a limitation on the size that Proc CORR can handle? I am
> using SAS 9.1 on Linux.
>
> This is urgent. So a prompt response will be greatly appreciated.
[1] This is a mailing list/newsgroup of SAS users who have jobs and
deadlines themselves. I'm sure you didn't mean to sound so rude as
to demand a 'prompt response' when we all have jobs just as
time-oriented
as you do.
[2] The PROC CORR documentation has specifics on the memory usage given
the number of variables, etc. It would have been much faster for you
to check that yourself. If you don't have a copy locally, you can read
the docs at sas.com .
[3] 6122 variables is ridiculous. How can you possibly work with a
6122x6122 matrix of correlations? That's 18,736,381 unique
correlations,
just considering the upper triangular matrix and ignoring all the main
diagonal. If you want to pick out 'large' correaltions, you'll get
so many spuriously significant results that you won't be able to
separate
the valuable from the random. With that many correlations to wade
through,
under the null hypothesis you would *expect* to find over 900,000 false
positives.
[4] With 6122 variables, you cannot manage when you have only 784
records.
What do you expect to achieve with this sort of approach? Perhaps you
should
re-think your business process here.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician