**Date:** Mon, 14 Feb 2005 11:51:18 -0500
**Reply-To:** Susie Li <Susie.Li@TVGUIDE.COM>
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** Susie Li <Susie.Li@TVGUIDE.COM>
**Subject:** Re: Influence of the number of categories on chi-square score
**Content-Type:** text/plain; charset="iso-8859-1"
To even surmise which functional form of X to put into the intinal linear
logistic model (x, x**2, x**3, sqrt(x), 1/x, etc), I rely heavily on SAS
scatter plots (no need for SAS Graph). Right now, I break the contiuous X
into decile groups, and then plot the log_odds of response by 10 X_decile
groups. That's very efficient for discovering the relationship.

The frequency table of the X_decile groups by Y_renewal would give me the
chi-square test to test for the X-Y association (hence my question: what's
the impact on my chis-square if I break X into 20 groups instead of 10
groups?)

Susie Li
TV Guide

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Peter Flom
Sent: Monday, February 14, 2005 11:15 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Influence of the number of categories on chi-square score

In terms of finding a model why not logistic for both independent
variables? You can add quadratic and cubic terms and see if they are
statistically significant (although I would rather base my decision on
substantive or graphic results - stat sig depends too much on sample
size. Is there a reason to suspect that there will be quad or cubic
relationships? Do the graphs (see below) reveal such a thing? See
below for more (better???) ideas

For finding the shape of the relationship, looking at plots is always a
good idea. I don't know how to do this best in SAS, as I have no access
to SAS GRAPH. I do this sort of thing in R. If you have SAS GRAPH,
doubtless someone here will be able to advise.

You could plot a smoothed version of the DV to each IV

One thing I also like to do is plot the predicted values for various
models against each other - if the differences are substantively large,
then the more complex model may be worthwhile, if not, then go with the
simpler model.

As a general strategy, the approach based on AIC seems to have much to
recommend it. For details, see Burnham and Anderson Model Selection and
Multimodel Inference

Briefly: Come up with some (5 or 10 or so) reasonable models, each
should be sensible based on SUBSTANTIVE grounds

Go with the one with the lowest AIC

(that's a 3 line description of a nearly 500 page book, so take it with
a ton of salt)

Another very good book on regression generally is Harrel Regression
modeling strategies.

HTH

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)

>>> Susie Li <Susie.Li@TVGUIDE.COM> 2/14/2005 11:00:37 AM >>>
A typical example of my logistic modeling:

My y dependent variable/binary - customer renewal (yes=renewed, no=not
renewed)

My X independent variables/continuous -
(1) current pricing structure ($0.25, $0.34,...)
(2) the tenure of the customer (how long the customer has been with
us,
i.e., 1 year, 2 year,...)

I want to know 2 things: (1) the existence of the "association" between
X
and Y (2) if an association exists, what is the functional form of the
association (linear, quadratic or cubic).

I've been using chi-square test for (1), and plot of log_odd versus X
for
(2).

Susie Li
TV Guide
1211 Avenue of the Americas
New York, NY 10036
Tel 212.852.7453
Email susie.li@tvguide.com