Date: Thu, 13 Jul 2006 11:31:32 -0500
Reply-To: Anthony Babinec <firstname.lastname@example.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Anthony Babinec <email@example.com>
Subject: Re: A Distinctly Non-Normal Distribution
Content-Type: text/plain; charset="US-ASCII"
Here are a couple general comments.
While the normal distribution might be a useful
assumed distribution for errors in regression, there
is no reason to think that it is necessarily useful for
summarizing all phenomena out there in the world.
As you have described your data, they are counts.
In other words, values are 1, 2, 3 etc., and not
real values in some interval.
Are you looking at consumption in some fixed unit of time -
say week, month, year? Given some assumptions, there
are distributions such as the poisson that might
be appropriate. It also could be the case that
what you are studying represents a mixture of types,
say usage types (low, medium, high), though that may or
may not be the case here.
Pete Fader(Wharton) and Bruce Hardie(London Business School)
have a nice course on probability models in marketing that is
regularly given at AMA events.
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Sent: Thursday, July 13, 2006 10:12 AM
Subject: A Distinctly Non-Normal Distribution
I have stumbled upon an interesting phenomenon: I have discovered that
consumption of a valuable resource conforms to a very regular, reverse
J-shaped distribution. The modal case in our large sample (N = 16,000)
consumes one unit, the next most common case consumes two units, the
next most common three units, the next most common four units -- and
this is the median case, and so on. The average is at about 9.7 units,
which falls between the 72nd and 73rd percentile in the distribution --
clearly NOT an indicator of central tendency.
I used SPSS Curve Estimation to examine five functional relationships
between units consumed and proportion of consumers in the sample,
testing proportion of consumers in the sample as linear, logarithmic,
inverse, quadratic, or cubic functions of number of units consumed. I
found that the reciprocal model, estimating proportion of cases as the
inverse of units consumed, was clearly the best solution, yielding a
remarkable, and very reliable R2 = .966. All five models were reliable,
but the next best was the logarithmic solution, with R2 = .539; worst
was the linear model, with R2 = .102.
These seems like a remarkably regular, quite predictable relationship.
I've spent my career so enamored with normal distributions that I'm not
sure what to make of this distribution. I have several questions for
Do any of you have experience with such functions? (I believe it would
be correct to call this a decay functions.)
Where are such functions most likely to occur in nature, commerce,
epidemiology, genetics, healthcare, and so on?
What complications arise when attempting to form statistical inferences
where such population distributions are present? (We have other
measurements for subjects in this distributions, measurements which are
quite nicely normal in their distributions.)
Your curious colleague,
Stevan Lars Nielsen, Ph.D.
Brigham Young University