```Date: Thu, 24 Jun 1999 14:24:28 -0700 Reply-To: "James C. Creech" Sender: "SPSSX(r) Discussion" From: "James C. Creech" Organization: Virginia Sentencing Commission Subject: Re: some questions Content-Type: text/plain; charset=us-ascii My responses are below. Nathalie Holvoet wrote: > > Dear listmembers, > > I have a number of questions related to the statistics underlying SPSS > > 1. if conditions for Chi-square test and Fisher Exact are not satisfied > (e.g. if there are empty cells or if the frequency of some cells is too > low) what are then other possibilities to test the existence of > statistically significant differences between groups on a qualitative > variable? Chi-square is the alternative to Fisher's exact test. There is no alternative to chi-square that I know of. The only choices are (1) collapse categories on one or more variables if possible, (2) collect more data, or (3) use the statistic and caveat the problem. > 2. if you have a linear dependent variable ranging between 0-100, is it > possible to use linear regression or should you use another method, if yes, > which one? What are the problems if you use linear regression? Since linearity describes the relationship of two or more variables, I'll assume you meant to describe your dependent variable as continuous (or nearly continuous). Given that, yes regression is an appropriate statistical approach to exploring the relationship between your dependent variable and one or more independent variables. The main problem with using regression is meeting the assumptions, and, if an assumption is violated, what is the consequence and what alternative approach can be taken. For example, collinearity among independent variables is problematic in more than one way. If there is perfect collinearity (an exact linear relationship between two or more independent variables), that condition violates an assumption of regression and no parameter estimates can be obtained (basically it becomes impossible to change the value of one independent variable while holding the collinear independent variable(s) constant). With near-collinear independent variables, parameter estimates can be obtained, however, the correlations among these independent variables are too large to allow for a precise estimate of their unique effect on the dependent variable. Before jumping into the data analysis, I suggest that you get a good foundation in the desired statistical procedure. With regression, I'd suggest that you start with Norman Draper and Harry Smith's Applied Regression Analysis for a fairly comprehensive coverage of the topic, and for a readable overview of regression assumptions and methods for detecting violation of assumptions, see William Berry's Understanding Regression Assumptions and John Fox's Regression Diagnostics. > 3. it seems that the Durbin-Watson test for autocorrelation may only be > used if you have observations which are strictly ranked (e.g. time-series). > If observations are not strictly ranked what is the alternative test for > autocorrelation? In random samples, it generally can be assumed that autocorrelation will not be present. If observations are structured relative to one another, then autocorrelation may be of concern. The typical example is in time-series data, where the data is organized by time, but there are other situations where the data has a spatially dependent organization. The Durbin-Watson test was the easiest to perform because it is only concerned with first-order autocorrelation (correlated error with the next observation). If you have the Trends module, you should also have access to ACF (autocorrelation function) which allows you to test for autocorrelation with cases beyond just the next one (the lags indicate how many cases away are being compared). If I had spatially structured data, and the ACF indicated that autocorrelation were present for several "lags", then I would suspect spatial autocorrelation. Although ACF will allow you to check for autocorrelation in any data set, but I suspect that if you were to find some significant lags, it would be more by chance than a truly serious problem. > 4. what is the distribution underlying the loglinear logit method and the > logistic regression method? how can you test if this distribution may be > applied on your data? Generally speaking, logit and logistic regression models share the same goal: prediction of a dependent variable from independent variables, but differ in the assumptions made about the measurement scale of the independent variables. If the both statistical procedures were used to estimate the same model with the same data, the same results would be obtained (except for estimation method differences). Neither procedure has distributional requirements for the dependent variable, so there is no need to test at that point. The logistic regression has several assumptions that are similar to regression. Two regression assumptions, however, are not assumed by logistic regression: (1) normality of errors - in logistic regression the errors are assumed to be distributed as a binomial distribution (when the sample is large, a binomial distribution will approximate a normal distribution) and (2) homogeneity of variance - there is a functional relationship between the standard deviation and the mean when the dependent variable is dichotomous, which is the primary reason why ordinary regression will not work with a dichotomous dependent variable and logistic regression was developed. For a foundation in logistic regression, I suggest you read David Hosmer and Stanley Lemeshow's Applied Logistic Regression, and for a very readable introduction Scott Menard's Applied Logistic Regression Analysis. > > Thanks, > Nathalie Holvoet > Institute for Development Policy and Management > RUCA > Middelheimlaan 1 > 2020 Antwerp ```

Back to: Top of message | Previous page | Main SPSSX-L page