```Date: Mon, 14 Feb 2005 11:51:18 -0500 Reply-To: Susie Li Sender: "SAS(r) Discussion" From: Susie Li Subject: Re: Influence of the number of categories on chi-square score Content-Type: text/plain; charset="iso-8859-1" To even surmise which functional form of X to put into the intinal linear logistic model (x, x**2, x**3, sqrt(x), 1/x, etc), I rely heavily on SAS scatter plots (no need for SAS Graph). Right now, I break the contiuous X into decile groups, and then plot the log_odds of response by 10 X_decile groups. That's very efficient for discovering the relationship. The frequency table of the X_decile groups by Y_renewal would give me the chi-square test to test for the X-Y association (hence my question: what's the impact on my chis-square if I break X into 20 groups instead of 10 groups?) Susie Li TV Guide -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Peter Flom Sent: Monday, February 14, 2005 11:15 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Influence of the number of categories on chi-square score In terms of finding a model why not logistic for both independent variables? You can add quadratic and cubic terms and see if they are statistically significant (although I would rather base my decision on substantive or graphic results - stat sig depends too much on sample size. Is there a reason to suspect that there will be quad or cubic relationships? Do the graphs (see below) reveal such a thing? See below for more (better???) ideas For finding the shape of the relationship, looking at plots is always a good idea. I don't know how to do this best in SAS, as I have no access to SAS GRAPH. I do this sort of thing in R. If you have SAS GRAPH, doubtless someone here will be able to advise. You could plot a smoothed version of the DV to each IV One thing I also like to do is plot the predicted values for various models against each other - if the differences are substantively large, then the more complex model may be worthwhile, if not, then go with the simpler model. As a general strategy, the approach based on AIC seems to have much to recommend it. For details, see Burnham and Anderson Model Selection and Multimodel Inference Briefly: Come up with some (5 or 10 or so) reasonable models, each should be sensible based on SUBSTANTIVE grounds Go with the one with the lowest AIC (that's a 3 line description of a nearly 500 page book, so take it with a ton of salt) Another very good book on regression generally is Harrel Regression modeling strategies. HTH Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) >>> Susie Li 2/14/2005 11:00:37 AM >>> A typical example of my logistic modeling: My y dependent variable/binary - customer renewal (yes=renewed, no=not renewed) My X independent variables/continuous - (1) current pricing structure (\$0.25, \$0.34,...) (2) the tenure of the customer (how long the customer has been with us, i.e., 1 year, 2 year,...) I want to know 2 things: (1) the existence of the "association" between X and Y (2) if an association exists, what is the functional form of the association (linear, quadratic or cubic). I've been using chi-square test for (1), and plot of log_odd versus X for (2). Susie Li TV Guide 1211 Avenue of the Americas New York, NY 10036 Tel 212.852.7453 Email susie.li@tvguide.com ```

Back to: Top of message | Previous page | Main SAS-L page