Date: Fri, 26 Jan 2001 13:47:41 -0800
Reply-To: "Dennis G. Fisher" <dfisher@CSULB.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dennis G. Fisher" <dfisher@CSULB.EDU>
Subject: Re: Discretizing continuous vars (was Proc GLM)
Content-Type: text/plain; charset=us-ascii
The point I was trying to make about birthweight is that an infant who weighs, for instance, 1501 grams is NOT at more risk than a child who weighs 2300 grams. That is exactly the point. With some variables such as birthweight an infant who is not a low birth weight infant is at the same risk as any other infant who is not low birth weight, that is, there is a ceiling (or floor depending upon how you look at it) effect. The cut off point may be in contention, as it was in this master's thesis, but low birth weight is a proxy for risk. Heavier infants are NOT at lower risk than normal weight infants. At least that is how it was explained to me.
I hope this clarifies what I meant. I do not however, disagree, with your main
point for most variables.
Dale McLerran wrote:
> Let me put in my $.02. Discretizing a continuous variable for
> use as a predictor variable is a very common artifice in the
> epidemiological literature. This is usually performed so that
> the epidemiologist can make some statement about relative risks
> for some outcome, and convey the RR in a simple manner to their
> colleagues (or at least an approximation to the RR). Now, it
> needs to be understood exactly what discretizing the continuous
> predictor variable actually is doing: it allows the user to fit
> a nonlinear curve to the data. Moreover, this nonlinear curve
> is discontinuous at the break points. This is an ugly model if
> I ever saw one. It says that the response is homogeneous within
> the (artificially) chosen intervals, and that from the end of
> one interval to the beginning of the next there is often a
> significant difference in the response. Now, I ask whether it
> is reasonable to believe that dietary habits (consumption of
> fruits and vegetables, percent energy from fat) change dramatically
> from age 34 to age 35, or from age 59 to age 60. I really suspect
> not, but these are commonly employed models. I would have to
> agree with Peter that risk for all kinds of poor outcomes related
> to low birth weight do not change dramatically from 1499 grams
> to 1500 grams. The risks are probably even greater if the infant
> weighs 1100 grams than if the infant weighs 1499 grams. And a
> child that weighs 1501 grams probably is at more risk for poor
> outcomes than a child who weighs 2300 grams.
> Now, I work with epidemiologists. I have fit many a regression
> model in which age has been discretized into 3 or 4 intervals.
> For simple presentation in epidemiological journals, these are
> the accepted standards. I will not chastise too loudly that
> this should not be done, although I have tried to suggest
> alternatives to my colleagues. I have absolutely no doubt that
> the models which use discretized continuous variables are biased.
> There are likely very few circumstances in which a noncontinuous
> response are reasonable. (I leave the door open for a few such
> outcomes. However, they do not regularly present themselves.)
> I have lately been working with an epidemiologist who has had
> something of an epiphany regarding these issues. When he came
> to me, he had collaborated with another statistician in the use
> of flexible regression functions. In particular, for that
> collaboration they had employed Generalized Additive Models (GAMs).
> I am not a great fan of GAMs. When you are done fitting the
> model, can you state the regression equation? I don't believe
> that GAMs do provide a simple expression. However, there are
> other tools which allow for flexible regression modelling which
> yield functions with simple expressions. I had long thought
> that restricted cubic splines could be a very useful tool for
> modelling nonlinear (or suspected nonlinear) functions of
> continuous variables. We are currently using spline methods.
> Unlike GAMs, with splines you can plug in a value for some
> continuous predictor and get directly an estimated response.
> However, even though you may be able to return an estimate
> directly, it may still be difficult to convey the shape of the
> response without resorting to graphical methods. This is the
> direction which I believe we ought to be headed with the
> modelling of the relationship between responses and continuous
> covariates: fit some sort of flexible regression and graphically
> display the fitted response.
> For polytomous response models, I have developed a macro which
> will perform this work in (what I believe to be) a relatively
> easy to use package. I don't know that it is ready for prime
> time, but if there is interest in the use of the macro, I would
> be willing to share it.
> >Date: Thu, 25 Jan 2001 13:22:13 -0500
> >Reply-To: Peter Flom <peter.flom@NDRI.ORG>
> >From: Peter Flom <peter.flom@NDRI.ORG>
> >Subject: Re: Proc GLM
> >To: SAS-L@LISTSERV.UGA.EDU
> >>>> "Dennis G. Fisher" <dfisher@CSULB.EDU> 01/25/01 01:08PM >>>
> >>>>I have to weigh in on this one. Usually I would agree that ruining a >>>perfectly good continuous variable by dichotomizing it is not a good >>>thing to do and I once gave such advice to a grad student. It turned out >>>that I was wrong. The variable was birthweight. This actually turned out >>>to be a dichotomous variable, which is something I did not know at the >>>time. Infants can be classified into low birth weight and non low >>>birthweight. Low birth weight is a proxy (or perhaps an indicator) that >>>there were problems with the pregnancy. So non-low birthweight infants >>>mean that the indicators of lbw problems were not present. It does not >>>mean that infants who are very heavy are somehow protected against >>>these problems. In the case of this grad student, the infants should
> >>>>have been classified into low birth weight and non low birthweight. >>>Weight should not have been treated as a continuous variable. You >>>have to understand the meaning of the variable before giving an opinion >>>about the analysis. So I guess I agree with Dr. Kruse.
> >Clearly, understanding the menaing of the variable before giving an opinion is vital, and I hesitate to argue with someone who knows so much more than I about statistics.
> >However, it seems to me that even low birth weight is not a Yes/No variable.
> >One classification I have seen is 1500 grams. But, dichotomizing at this point implies that a baby of 1499 grams is markedly different from one weighing 1501 grams. It seems to me that babies who weigh 1,000 grams would be at much more risk that those who weigh 1,500 grams, although I don't know the literature on the subject. I would suspect that, if one graphed "proportion of problem pregnancies" vs. "birth weight" the curve would asymptote at some point. So, one useful transformation of weight might be "weight below" the number at which the asymptote occurs.
> >Does this make sense?
> >Peter L. Flom, Ph.D.
> >Principal Research Associate
> >National Development and Research Institutes, Inc.
> >2 World Trade Center
> >16th floor
> >New York, NY 10048
> >(212) 845-4485
> >(212) 845-4698 (fax)
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: firstname.lastname@example.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> --== Sent via Deja.com ==--
Dennis G. Fisher, Ph.D.
Center for Behavioral Research and Services
1090 Atlantic Avenue
Long Beach, CA 90813