Date: Thu, 9 Dec 2010 14:51:13 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Statistics question, negative binomial regression residuals
Content-Type: text/plain; charset=iso-8859-1
Yes, I thought of the beta distribution when the probability
density function was stated to be U-shaped. Bi-modal at the
ends of the distribution might well be described through a
beta distribution.
Bi-modality alone is not enough to suggest a beta distribution.
A U-shape does suggest a beta distribution. But if there are
two peaks with densities decaying on both sides of one or both
peaks, some sort of mixture distribution is indicated.
Note that the standard beta distribution has the properties
0 x<=0
P(X=x) = f(alpha,beta) 0<x<1
0 x>=1
That is, the density is only positive for x between 0 and 1,
exclusive of the 0 and 1 values. The parameters alpha and
beta are positive, and the expected value of X is
E(X) = alpha / (alpha + beta)
There is a four parameter beta distribution for which
0 x<=theta
P(X=x) = f(alpha,beta,theta,sigma) theta<x<theta+sigma
0 x>=theta+sigma
The GLIMMIX procedure allows regression estimates for the
two parameter beta distribution. The CAPABILITY procedure
allows estimates for the four parameter beta distribution.
Note, though, that the CAPABILITY procedure only allows a
univariate analysis. There is no capacity for producing
regression estimates when you use the CAPABILITY procedure.
As an alternative to either the GLIMMIX or CAPABILITY procedures,
the NLMIXED procedure can be employed to estimate a four
parameter beta distribution.
Beta distribution regression came up just recently on SAS-L.
I did respond to that post with some initial NLMIXED code and
a link to some information about fitting a beta regression model.
I also provided a link to a paper describing beta regression
estimation in SAS, SPSS, and SPlus/R. Some of what I suggested
in that previous post about estimating an upper bound does not
work. And the paper which I linked to as well did some hand
waving about what to do when the upper bound of the beta
distribution could not be assumed to be 1.
Given the interest by multiple parties in a beta distribution
as well as limitations of SAS procedures for fitting a beta
regression and the limitations of what has previously been
posted on this topic, the remainder of this post provides a
macro which allows estimation of a beta regression with 0, 1,
or 2 range parameters needing to be estimated.
%macro BetaDistReg(data=, y=, x=, LB=Est, UB=Est);
/***************************************************************/
/* Macro for fitting regression where the response follows a */
/* beta distribution with two shape parameters and optionally */
/* parameters that for the range of the beta distribution. */
/* The standard beta distribution is supported on a range from */
/* 0 to 1, exclusive. This macro allows estimation of a */
/* non-zero lower bound and an upper bound which is not 1. */
/* */
/* Author: Dale McLerran */
/* Date: 12/09/2010 */
/* */
/* Macro variables: */
/* DATA - specifies the name of the input data set */
/* Y - specifies the response variable */
/* X - names predictor variables */
/* LB - specifies a lower bound for the beta */
/* distribution. Specifying LB=EST indicates that */
/* the lower bound should be estimated by maximum */
/* likelihood. This is the default. You can also */
/* specify a numeric value for the lower bound. */
/* UB - specifies an upper bound for the beta */
/* distribution. Specifying UB=EST indicates that */
/* the upper bound should be estimated by maximum */
/* likelihood. This is the default. You can also */
/* specify a numeric value for the upper bound. */
/* */
/* This macro allows specification one or both of the */
/* range (boundary) values. You can also estimate both */
/* boundary values. */
/* */
/* The response and predictor variables must all exist on the */
/* input data set. None of the predictors can be constructed */
/* during execution. That means that if you want to employ */
/* an interaction term, the interaction variable must be */
/* constructed on the data set. */
/* */
/* Alternative methods for estimating parameters of a beta */
/* distribution are provided by the GLIMMIX procedure (which */
/* allows regression of a response variable following a beta */
/* but which assumes that the response range is from 0 to 1) */
/* and the CAPABILITY procedure (which does not fit a */
/* regression model but which does allow estimation of both */
/* lower and upper bounds for the response variable). */
/* */
/* Examples: */
/* Estimate the standard beta distribution parameters */
/* (assuming a range of 0 to 1) for a univariate response */
/* */
/* %BetaDistReg(data=mydata, y=y, LB=0, UB=1) */
/* */
/* */
/* Estimate a regression model with a response following the */
/* beta distribution with lower bound 0 and upper bound to */
/* be estimated */
/* */
/* %BetaDistReg(data=mydata, y=y, X=x1 x2, LB=0, UB=est) */
/* */
/* */
/* Estimate a regression model with a response following the */
/* beta distribution where both the lower and upper bounds */
/* of the beta distribution need to be estimated */
/* */
/* %BetaDistReg(data=mydata, y=y, X=x1 x2, LB=est, UB=est) */
/* */
/***************************************************************/
%put entered into macro;
%let LB=%upcase(%quote(&LB));
%let UB=%upcase(%quote(&UB));
%if %quote(&LB)=EST | %quote(&UB)=EST %then %do;
/* Determine min(Y) & max(Y) for initialization of LB and UB */
proc sort
data=&data(where=(missing(&y)=0
%if &x^=%str() %then %do;
%let i=1;
%let x_i=%scan(&x,&i,%str( ));
%do %while(&x_i^=%str());
& missing(&x_i)=0
%let i=%eval(&i+1);
%let x_i=%scan(&x,&i,%str( ));
%end;
%end;
))
out=_tmp_;
by &y;
run;
data _null_;
set _tmp_ end=lastrec;
if _n_=1 then call symput("Ymin", put(&y,best16.));
if lastrec then call symput("Ymax", put(&y,best16.));
run;
%end;
title "Beta Regression: %quote(&LB)<&Y<%quote(&UB)";
proc nlmixed data=&data tech=trureg;
parms
/* Initialize parms for expectation */
b0_mu
%if &x^=%str() %then %do;
%let i=1;
%let x_i=%scan(&x,&i,%str( ));
%do %while(&x_i^=%str());
b&i._mu
%let i=%eval(&i+1);
%let x_i=%scan(&x,&i,%str( ));
%end;
%end;
/* Initialize parms for phi = alpha + beta */
b0_phi
%if &x^=%str() %then %do;
%let i=1;
%let x_i=%scan(&x,&i,%str( ));
%do %while(&x_i^=%str());
b&i._phi
%let i=%eval(&i+1);
%let x_i=%scan(&x,&i,%str( ));
%end;
%end;
0
/* Initialize parms for lower and upper bounds */
%if %quote(&LB)=EST & %quote(&UB)=EST %then %do;
ltheta_offset lsigma_offset
%end;
%else %if %quote(&LB)^=EST & %quote(&UB)=EST %then %do;
lsigma_offset
%end;
%else %if %quote(&LB)=EST & %quote(&UB)^=EST %then %do;
ltheta_offset
%end;
%if %quote(&LB)=EST | %quote(&UB)=EST %then %do;
%sysfunc(log(%sysevalf((&Ymax-&Ymin)/500)))
%sysfunc(log(%sysevalf((&Ymax-&Ymin)/100)))
%sysfunc(log(%sysevalf((&Ymax-&Ymin)/50)))
%sysfunc(log(%sysevalf((&Ymax-&Ymin)/10)))
%end;
;
/* Compute mu and phi */
eta_mu = b0_mu
%if &x^=%str() %then %do;
%let i=1;
%let x_i=%scan(&x,&i,%str( ));
%do %while(&x_i^=%str());
+ b&i._mu*&x_i
%let i=%eval(&i+1);
%let x_i=%scan(&x,&i,%str( ));
%end;
%end;
;
mu = 1 / (1 + exp(eta_mu));
eta_phi = b0_phi
%if &x^=%str() %then %do;
%let i=1;
%let x_i=%scan(&x,&i,%str( ));
%do %while(&x_i^=%str());
+ b&i._phi*&x_i
%let i=%eval(&i+1);
%let x_i=%scan(&x,&i,%str( ));
%end;
%end;
;
phi = exp(eta_phi);
/* Compute alpha and beta */
alpha = mu * phi;
beta = phi*(1 - mu);
/* Specify lower and upper bounds */
%if %quote(&LB)^=EST & %quote(&UB)^=EST %then %do;
theta = &LB;
sigma = &UB - &LB;
%end;
%else %if %quote(&LB)=EST & %quote(&UB)=EST %then %do;
theta = &Ymin - exp(ltheta_offset);
sigma = &Ymax + exp(lsigma_offset) - theta;
%end;
%else %if %quote(&LB)^=EST & %quote(&UB)=EST %then %do;
theta = &LB;
sigma = &Ymax + exp(lsigma_offset) - theta;
%end;
%else %if %quote(&LB)=EST & %quote(&UB)^=EST %then %do;
theta = &Ymin - exp(ltheta_offset);
sigma = &UB - theta;
%end;
/* Construct log likelihood */
lnum1 = (alpha-1)*log(&Y-theta);
lnum2 = (beta - 1)*log(sigma + theta - &Y);
ldenom1 = lgamma(alpha) + lgamma(beta) - lgamma(alpha + beta);
ldenom2 = (alpha + beta - 1)*log(sigma);
ll = lnum1 + lnum2 - ldenom1 - ldenom2;
/* Maximize the likelihood */
model &Y ~ general(ll);
/* If there are no predictor vars, show alpha and beta */
%if &X=%str( ) %then %do;
estimate "alpha" alpha;
estimate "beta" beta;
%end;
/* If estimation of lower/upper bounds, show the estimates */
%if %quote(&LB)=EST & %quote(&UB)=EST %then %do;
estimate "Min(Y)" theta;
estimate "Max(Y)" theta + sigma;
%end;
%else %if %quote(&LB)^=EST & %quote(&UB)=EST %then %do;
estimate "Max(Y)" theta + sigma;
%end;
%else %if %quote(&LB)=EST & %quote(&UB)^=EST %then %do;
estimate "Min(Y)" theta;
%end;
run;
%mend;
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
--- On Thu, 12/9/10, Susan Durham <sdurham@BIOLOGY.USU.EDU> wrote:
> From: Susan Durham <sdurham@BIOLOGY.USU.EDU>
> Subject: Re: Statistics question, negative binomial regression residuals
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Thursday, December 9, 2010, 7:12 AM
> If there are a fixed number of
> sessions that could have been attended (e.g.,
> 12), then a beta distribution might accommodate the bimodal
> pattern in the
> proportion attended....
>
> Susan
>
>
> On Wed, 8 Dec 2010 16:23:33 -0500, Talbot Michael Katz
> <topkatz@MSN.COM>
> wrote:
>
> >Hi.
> >
> >I haven't done this myself, but it looks like GLIMMIX
> allows you to modify
> >the link / variance function(s). The DV
> distribution you describe sounds
> >kind of bimodel, not amenable to a negative binomial
> fit. Perhaps you
> >could fool around with your own functions to better
> approximate the
> >distribution of your data.
> >
> >Good luck!
> >
> >-- TMK --
> >"The Macro Klutz"
> >
> >
> >On Wed, 8 Dec 2010 15:27:19 -0500, Peter Flom
> ><peterflomconsulting@MINDSPRING.COM>
> wrote:
> >
> >>Hi Steve
> >>
> >>The DV is number of sessions attended, which can
> vary from 0 to 12.
> >About 20% are 0 and 20% are 12, it's basically U
> shaped. For some
> >analyses I may only be using 1 to 12.
> >>
> >>The residuals are very skew, in that, although the
> maximum sessions
> >attended was 12, the highest predicted value was a
> little over 4.
> >>
> >>There does seem to be a dearth of literature on
> these types of issues.
> >>
> >>Peter
> >>
> >>-----Original Message-----
> >>>From: Steve Denham <stevedrd@YAHOO.COM>
> >>>Sent: Dec 8, 2010 2:30 PM
> >>>To: SAS-L@LISTSERV.UGA.EDU
> >>>Subject: Re: Statistics question, negative
> binomial regression residuals
> >>>
> >>>Peter,
> >>>
> >>>How strange are the patterns? I would
> worry about zero inflation, upper
> >>>truncation and lower censoring pretty much in
> that order. If the
> >pattern is
> >>>something else, is there a latent variable that
> might be lurking
> >somewhere?
> >>>What about changing the link function?
> >>>
> >>>Refs and links would be welcome here as well.
> >>> Steve Denham
> >>>Associate Director, Biostatistics
> >>>MPI Research, Inc.
> >>>
> >>>
> >>>
> >>>----- Original Message ----
> >>>From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
> >>>To: SAS-L@LISTSERV.UGA.EDU
> >>>Sent: Wed, December 8, 2010 8:49:50 AM
> >>>Subject: Statistics question, negative binomial
> regression residuals
> >>>
> >>>Hi all
> >>>
> >>>I'm doing some negative binomial regression in
> a mixed model
> >(longitudinal data)
> >>>and getting very strange patterns in the
> residuals.
> >>>
> >>>I'm looking for references on the assumptions
> of NEGBIN in GLIMMIX, and
> >how to
> >>>deal with violations, and the consequences.
> >>>
> >>>I've googled a bit, and looked in both Long's
> book "Regression models for
> >>>categorical and limited dependent variables"
> and Cameron and Trivedi,
> >but have
> >>>not found anything good.
> >>>
> >>>Any references or links appreciated
> >>>
> >>>thanks
> >>>
> >>>Peter
> >>>
> >>>Peter L. Flom, PhD
> >>>Statistical Consultant
> >>>Website: http://www DOT statisticalanalysisconsulting DOT com/
> >>>Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
> >>>Twitter: @peterflom
> >>
> >>
> >>Peter L. Flom, PhD
> >>Statistical Consultant
> >>Website: http://www DOT statisticalanalysisconsulting DOT com/
> >>Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
> >>Twitter: @peterflom
>
|