Date: Wed, 11 Jul 2007 12:50:48 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Creating a ratio that avoids the divide by zero error
In-Reply-To: <2C94D655CCF2A94A8F2521F5CCB675BE01D9D267@EX-VS-1.econ.usyd
.edu.au>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 03:19 AM 7/11/2007, Gary Oliver wrote:
>I am working with a simple ratio (A divided by B). In the past when I
>have had a divide by zero error in creating a ratio I have converted
>them into missing values. In a current analysis there will be too many
>(almost 50%) for the non-missing values to be meaningful.
>
>Does anyone know of any commonly acceptable workarounds or more
>complicated formulas that might be used? The resulting ratios would be
>used for comparison with one another rather than as absolutely precise
>results.
There's no answering the question without a sense what you think the
ratios with zero divisors *mean*. That's a question about your study,
not an SPSS or statistical question.
First, you can't do anything unless you can argue convincingly that the
0 denominators represent a non-zero value about which you don't have
information, not a value that really is 0, or not there at all.
Second, and this can be an exception to the above rule, if the
numerator is also 0 when the denominator is, it is fairly common, and
often correct, to take the 0/0 values as 0. That can be done with
simple tests in SPSS. *However*, include the numerator in the test, so
you won't get a completely wrong value if it *isn't* 0/0. DON'T DO THIS
WITHOUT CONSIDERING WHETHER THEY REAL *ARE* 0 VALUES. And make the
argument in your paper, prepare to defend it to your statistical
reviewer. (You'll see that advice again, below.)
Third, your project may be dubious altogether. This isn't about 0/0.
50% missing in a crucial variable (and the denominator *is* crucial)
virtually guarantees your sample isn't representative of the
population. Check with whatever other data you have, and see what you
can learn about the pattern of missing values. Consider replacing the
ratio altogether with its missing/non-missing dichotomy. Missing-value
substitution is considered very unreliable with 50% missing data. If
you're going to ignore that and do it anyway, do the missing-value
substitution on the denominator, not the ratio. And prepare for a
statistical reviewer taking a very dim view.
Fourth, consider whether a 0 denominator and non-zero denominator are
consistent with what you know about your data source. If not, the
non-zero denominators are data errors (which, at 50%, renders your data
useless for analysis), or missing values (which may be addressable -
see below).
Fifth, if the denominator has a very low coefficient of variation
(SD/Mean), it may be reasonable to substitute the mean for the missing
values. However, you'll have to make a strong argument that the
denominator in half with missing data have a low CV as well, and an
underlying mean value that's quite close. And this whole argument is
not likely to apply to you, since if there is that low CV the proper
measure would probably be the numerator in the first place. And see
prior comments about your statistical reviewer.
Regarding the statistical review: Make the strongest argument you
possibly can, in your original paper. Make sure you believe it
yourself. If you put an apparently bad analysis in your manuscript, you
can depend on the reviewer putting little stock in any argument you
make after the fact.
And very good luck to you. I'm afraid you simply have a tough problem,
and there's no technique that can reliable get around it.
Richard Ristow