LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2011)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 7 Dec 2011 07:58:08 -0700
Reply-To:     Jon K Peck <peck@us.ibm.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Jon K Peck <peck@us.ibm.com>
Subject:      Re: A MATRIX mystery
In-Reply-To:  <1323269185741-5055860.post@n5.nabble.com>
Content-Type: multipart/alternative;

My guess is that the difference is due to using a numerically unstable way to compute the regression. The Regression procedure does not proceed by just calculating with the textbook expression. It uses numerical methods that are designed to minimize precision loss due to extreme magnitudes and high collinearity.

If you were to rewrite your regression code, say, to use a Cholesky factorization rather than a brute force matrix inversion, I'll bet that you could get the same results. It could also be that the INV function singularity criterion is too sensitive to scale differences. I haven't looked into that code.

Regards,

Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@us.ibm.com new phone: 720-342-5621

From: Bruce Weaver <bruce.weaver@hotmail.com> To: SPSSX-L@listserv.uga.edu Date: 12/07/2011 07:50 AM Subject: [SPSSX-L] A MATRIX mystery Sent by: "SPSSX(r) Discussion" <SPSSX-L@listserv.uga.edu>

Here's a little MATRIX mystery someone might be able to help me solve.

*Background*: I ran a linear regression model of the form Y = b0 + b1*X + b2*X**2 + E via the REGRESSION procedure, and everything worked fine. Then I tried to run the same model via matrix equations, and it didn't run. The error message said "Source operand is singular for INV." I was using old MATRIX code that worked in the past, so I tried with a different explanatory variable, and it ran fine. Finally, I went back to my original X-variable, but divided it by 100 to make the numbers smaller. After that rescaling, it worked just fine.

This brings me to my question: Why does the model with X=WEIGHT run without problems using REGRESSION, but not using MATRIX? Apparently MATRIX runs into some limitation that REGRESSION doesn't. Is there something I can do to raise that limit?

My syntax is appended below for anyone who wants to try the various models. The data come from the Cars.sav file found in your "Samples" folder.

Cheers, Bruce

*--- Syntax for the various models --- .

* This demo uses data from the "Cars.sav" file that * comes with SPSS. It runs two regression models * of the form Y = b0 + b1*X + b2*X**2 + E. * Both models run when I use REGRESSION. * But only the model with HORSE runs using MATRIX. * The model using WEIGHT results in a singularity. * If I rescale WEIGHT to WEIGHT/100, the model runs. * I don't know why the model using WEIGHT runs via * REGRESSION, but not via MATRIX .

new file. dataset close all.

* Change path below as necessary. get file = "C:\SPSSdata\Cars.sav".

select if nmiss(mpg,weight,horse) EQ 0.

GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

* Delete the odd case where vehicle weight < 1000 lbs. select if weight GT 1000. GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

compute hpsq = horse**2. /* horsepower squared. compute wsq = weight**2. /* weight-squared variable . compute w100 = weight/100. compute w100sq = w100**2.

REGRESSION /STATISTICS COEFF R ANOVA CI(95) /DEPENDENT=MPG /METHOD=ENTER horse hpsq . REGRESSION /STATISTICS COEFF R ANOVA CI(95) /DEPENDENT=MPG /METHOD=ENTER weight wsq . REGRESSION /STATISTICS COEFF R ANOVA CI(95) /DEPENDENT=MPG /METHOD=ENTER w100 w100sq .

* ------------------------------------------ . * Now run the same models via MATRIX. * Run the MATRIX section 3 times, commenting * out two of the "GET x" commands each time * to choose the appropriate design matrix. * ------------------------------------------ .

compute c = 1. /* First column of design matrix. exe.

matrix.

GET y /file = * /var = MPG.

* Comment out one of the "GET x" commands below . GET x /file = * /var = c horse hpsq. *GET x /file = * /var = c weight wsq. *GET x /file = * /var = c w100 w100sq.

compute B = inv(t(x)*x)*t(x)*y. /* Vector of regression coefficients.

compute n = nrow(y). /* N = number of cases . compute ybar = msum(y)/n. /* Mean of Y values . compute dev.y = y-ybar. /* Matrix of (Y-Ybar) deviation scores. compute ss.y = t(dev.y)*dev.y. /* SS(Y) = dev'dev. compute yhat = x*b. /* Yhat = XB . compute e = y - yhat. /* E = Y-Yhat . compute ss.e = t(e)*e. /* SSE = E'E . compute ss.reg = ss.y - ss.e. /* SS_regression . compute df1 = ncol(x)-1. /* df for numerator of F-ratio. compute df2 = n-df1-1. /* df for denominator of F-ratio . compute ms.reg = ss.reg/df1. /* MS regression . compute ms.e = ss.e/df2. /* Variance of residuals . compute covB = inv(t(x)*x)*ms.e. /* (X'X)^-1*MS_error . compute var.b = diag(covB). /* Variances of B . compute se.b = sqrt(var.b). /* SE of B . compute f = ms.reg/ms.e. /* F-statistic . compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio . compute Rsq = ss.reg / ss.y . /* R-squared value .

compute BSE = { B,se.b }. /* Regression coefficients & standard errors . compute Summary = MAKE(1,6,0). compute Summary(1,1) = Rsq. /* R-squared value . compute Summary(1,2) = SQRT(ms.e). /* Root mean square error . compute Summary(1,3) = f. /* F-ratio for regression model. compute Summary(1,4) = df1. /* df for numerator. compute Summary(1,5) = df2. /* df for denominator. compute Summary(1,6) = p. /* p-value for F-test .

print BSE / format = "f8.3" / title= "Regression coefficients & standard errors" / clabel = "B" "SE(B)" .

print Summary / format = "f8.3" / title= "Summary statistics for the regression model" / clabel = "R-sq" "RMSE" "F" "df1" "df2" "p" .

end matrix.

----- -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.

-- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-tp5055860p5055860.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page