Date: Wed, 7 Sep 2005 17:37:37 -0700
Reply-To: Daniel Nordlund <res90sx5@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Daniel Nordlund <res90sx5@VERIZON.NET>
Subject: Re: Dependent Variable a Proportion (Batting Average),
or Otherwise Censored on Both Sides (Slugging Average)
Content-type: text/plain; charset=UTF-8
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Arthur
> Sent: Wednesday, September 07, 2005 3:44 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Dependent Variable a Proportion (Batting Average), or Otherwise
> Censored on Both Sides (Slugging Average)
> I noticed that no one has responded yet, thus will at least contribute my
> initial thoughts.
> While I think you could go either way, I've had the best luck treating
> such proportions as continuous variables and using either proc reg or proc
> glm (although that obviously doesn't discount the possibility of proc gam
> I've gone that route for two principal reasons: (1) having the ability to
> analyze the distribution to see if it had to be transformed (which, if the
> frequency of batting is similar to the frequency of automobile crashes,
> will follow a poisson) and (2) having the ability to consider credibility
> (i.e., only including subjects who had at least X at bats, in order to
> have a rationale for excluding low confidence data).
> On Wed, 7 Sep 2005 11:25:15 -0400, Talbot Michael Katz <topkatz@MSN.COM>
> >Suppose you want to model proportions, such as baseball batting averages.
> >If you had the number of hits, H, and the number of at bats, B, for each
> >individual, you could use events / trials syntax in PROC GENMOD or PROC
> >LOGISTIC or PROC PROBIT, e.g.
> > MODEL H / B = height weight league ... ;
> >Would this be a good choice (would each of the three PROCs be equally
> >acceptable)? What if you only had the batting average, A, not H and B.
> >Then you couldn't use events / trials. Would you simply force A into the
> >dependent variable role in the same three PROCs? Would you create phony H
> >and B values such that H / B = A in order to use events / trials syntax?
> >What if the proportional quantity can take on values outside of [0,1]?
> >For example, slugging average varies between 0 and 4. If I know the
> >number of singles, S, the number of doubles, D, the number of triple, T,
> >and the number of home runs, H, then I can compute a weighted sum
> > W = S + (2*D) + (3*T) + (4*H);
> >and my slugging average G = W / B. Can I plug W / B sensibly into the
> >events / trials syntax of GENMOD, LOGISTIC, or PROBIT? And what if I only
> >have G?
> >Does it make more or less sense in any of these cases to use PROC QLIM?
> >There we could do
> > PROC QLIM;
> > MODEL A = height weight league ... ;
> > ENDOGENOUS A ~ censored(lb=0 ub=1) ;
> > PROC QLIM;
> > MODEL G = height weight league ... ;
> > ENDOGENOUS G ~ censored(lb=0 ub=4) ;
> >What do you think?
> >-- TMK --
> >"The Macro Klutz"
For those who know their linear modeling well (you know who you are :-), I have sometimes wondered if in situations like this where you have a proportion or probability for each subject, whether one might compute logits, ln[p/(1-p)), and use them as the dependent measure, with an identity link and binomial errors, in say GENMOD. Is this reasonable, or totally off the wall? Enquiring minds would like to know.