Date: Sun, 7 Jun 2009 08:43:03 +0800
Reply-To: Eins Bernardo <einsbernardo@yahoo.com.ph>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Eins Bernardo <einsbernardo@yahoo.com.ph>
Subject: Re: detecting linear combinations/high correlations in a data set
Content-Type: multipart/alternative;
Hi Mark, Art, etc
What do you mean by "singularities"?
Thank you.
Eins
--- On Sat, 6/6/09, Art Kendall <Art@DrKendall.org> wrote:
From: Art Kendall <Art@DrKendall.org>
Subject: Re: detecting linear combinations/high correlations in a data set
To: SPSSX-L@LISTSERV.UGA.EDU
Date: Saturday, 6 June, 2009, 12:04 PM
When I pseudorandomly generate 150 cases with 550 variables, I of course get singularities.
Please describe the nature of your data. Then we may be able to make suggestions.
Are these some sort of repeated measures, e.g., items intended to be in scales, prices over time, energy at different wave-lengths, etc?
RELIABILITY can be useful for tracking down singularities. Open a new instance of SPSS. Copy the syntax below to a syntax file. Click <run>. Click <all>.
Then go back to the syntax and put fewer items into the scale. Finally try using just 150. You will see that the SMC squared multiple correlation column now has entries, But they are all 1.000. You can edit the RELIABILITY syntax to produce the whole correlation matrix, but in this instance that would be futile.
new file.
input program.
vector x (550,f3).
loop id = 1 to 150.
loop #p = 1 to 550.
compute x(#p) = rnd(rv.normal(50,10)).
end loop.
end case.
end loop.
end file.
end input program.
reliability variables= x1 to x550
/scale (bigbunch) = x1 to x550
/SUMMARY =all.
Art Kendall
Social Research Consultants
M wrote:
#yiv853264627 .hmmessage P
{
margin:0px;padding:0px;}
#yiv853264627 {
font-size:10pt;font-family:Verdana;}
Hi - I've got a large dataset (over 500 variables, 150K rows) and would like to detect
a) variables that are highly correlated with one another
b) linear combinations of variables likely to cause conditioning problems/failed pos.def. correlation matrices.
Whether I'm sampling or not, CORRELATIONS procedure won't take more than 100 variables, and wouldn't help with b), so I'm working with FACTOR and / EXTRACTION PC.
Question:
---------
Before chiseling the wheel, does someone have the code handy to produce the linear combination coefficients of the input variables leading to singularities? Thanks.
Marc.
Hotmail® has ever-growing storage! Don’t worry about storage limits. Check it out.===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Fast, Ad-free, Unlimited Storage - Yahoo! Mail allows you to have it all at http://ph.mail.yahoo.com
[text/html]