Date: Thu, 24 Jun 1999 14:24:28 -0700
Reply-To: "James C. Creech" <jcreech@VCSC.STATE.VA.US>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "James C. Creech" <jcreech@VCSC.STATE.VA.US>
Organization: Virginia Sentencing Commission
Subject: Re: some questions
Content-Type: text/plain; charset=us-ascii
My responses are below.
Nathalie Holvoet wrote:
>
> Dear listmembers,
>
> I have a number of questions related to the statistics underlying SPSS
>
> 1. if conditions for Chi-square test and Fisher Exact are not satisfied
> (e.g. if there are empty cells or if the frequency of some cells is too
> low) what are then other possibilities to test the existence of
> statistically significant differences between groups on a qualitative
> variable?
Chi-square is the alternative to Fisher's exact test. There is no alternative
to chi-square that I know of. The only choices are (1) collapse categories on
one or more variables if possible, (2) collect more data, or (3) use the
statistic and caveat the problem.
> 2. if you have a linear dependent variable ranging between 0-100, is it
> possible to use linear regression or should you use another method, if yes,
> which one? What are the problems if you use linear regression?
Since linearity describes the relationship of two or more variables, I'll assume
you meant to describe your dependent variable as continuous (or nearly continuous).
Given that, yes regression is an appropriate statistical approach to exploring
the relationship between your dependent variable and one or more independent
variables.
The main problem with using regression is meeting the assumptions, and, if an
assumption is violated, what is the consequence and what alternative approach
can be taken. For example, collinearity among independent variables is problematic
in more than one way. If there is perfect collinearity (an exact linear relationship
between two or more independent variables), that condition violates an assumption of
regression and no parameter estimates can be obtained (basically it becomes impossible
to change the value of one independent variable while holding the collinear independent
variable(s) constant). With near-collinear independent variables, parameter estimates
can be obtained, however, the correlations among these independent variables are too
large to allow for a precise estimate of their unique effect on the dependent variable.
Before jumping into the data analysis, I suggest that you get a good foundation in
the desired statistical procedure. With regression, I'd suggest that you start with
Norman Draper and Harry Smith's Applied Regression Analysis for a fairly comprehensive
coverage of the topic, and for a readable overview of regression assumptions and
methods for detecting violation of assumptions, see William Berry's Understanding
Regression Assumptions and John Fox's Regression Diagnostics.
> 3. it seems that the Durbin-Watson test for autocorrelation may only be
> used if you have observations which are strictly ranked (e.g. time-series).
> If observations are not strictly ranked what is the alternative test for
> autocorrelation?
In random samples, it generally can be assumed that autocorrelation will not be
present. If observations are structured relative to one another, then autocorrelation
may be of concern. The typical example is in time-series data, where the data is
organized by time, but there are other situations where the data has a spatially
dependent organization. The Durbin-Watson test was the easiest to perform because
it is only concerned with first-order autocorrelation (correlated error with the next
observation). If you have the Trends module, you should also have access to ACF
(autocorrelation function) which allows you to test for autocorrelation with cases
beyond just the next one (the lags indicate how many cases away are being compared).
If I had spatially structured data, and the ACF indicated that autocorrelation were
present for several "lags", then I would suspect spatial autocorrelation.
Although ACF will allow you to check for autocorrelation in any data set, but I
suspect that if you were to find some significant lags, it would be more by chance
than a truly serious problem.
> 4. what is the distribution underlying the loglinear logit method and the
> logistic regression method? how can you test if this distribution may be
> applied on your data?
Generally speaking, logit and logistic regression models share the same
goal: prediction of a dependent variable from independent variables, but
differ in the assumptions made about the measurement scale of the
independent variables. If the both statistical procedures were used to
estimate the same model with the same data, the same results would be
obtained (except for estimation method differences). Neither procedure
has distributional requirements for the dependent variable, so there is
no need to test at that point. The logistic regression has several
assumptions that are similar to regression. Two regression assumptions,
however, are not assumed by logistic regression: (1) normality of errors -
in logistic regression the errors are assumed to be distributed as a binomial
distribution (when the sample is large, a binomial distribution will approximate
a normal distribution) and (2) homogeneity of variance - there is a functional
relationship between the standard deviation and the mean when the dependent
variable is dichotomous, which is the primary reason why ordinary regression
will not work with a dichotomous dependent variable and logistic regression
was developed.
For a foundation in logistic regression, I suggest you read David Hosmer
and Stanley Lemeshow's Applied Logistic Regression, and for a very readable
introduction Scott Menard's Applied Logistic Regression Analysis.
>
> Thanks,
> Nathalie Holvoet
> Institute for Development Policy and Management
> RUCA
> Middelheimlaan 1
> 2020 Antwerp