Date: Thu, 14 May 1998 12:48:21 +0200 Paul Dickman "SAS(r) Discussion" Paul Dickman Re: Proc Genmod - Scale Parameter cc: martin trollope text/plain; charset="us-ascii"

Martin,

The scale parameter is related to overdispersion. It's impossible to explain the scale parameter without getting into some of the theory of GLMs. I'll start with some theory and then try and explain the practical implications.

For a simple linear model (ordinary least squares) we have:

y_i = x'b + e_i

where e_i ~ N(0,sigma^2)

Var(y_i)=sigma^2

That is, we assume that the variance of the response is identical for all combinations of the covariates and equal to sigma^2, which is estimated from the data.

For a generalised linear model (GLM) we have:

g(u_i)=x'b, where u_i=E(y_i) and g is the link function.

and Var(y_i)=phi V(u_i)

That is, the variance of y_i is equal to some constant phi (the scale parameter) times a function of the expectation of y_i.

If we specify a GLM with a normal error structure and identity link (that is, a simple linear model), V(u_i) is set to one and Var(y_i) is constant for all values of y_i. In this case, the scaled deviance is our estimate of sigma^2.

If we are fitting a Poisson regression model (which I assume you are doing), we have the variance equal to the mean. That is Var(y_i)=E(y_i)=u_i. By default, the model is fitted in GENMOD under the assumption that the data were generated by a Poisson process, that is, Var(y_i)=u_i and the scale parameter (phi) is set equal to one. The estimate of the scale parameter is therefore reported as 1.000 and a note written in the output file that the scale parameter was fixed.

To reiterated, in the simple linear model, the y_i's are assumed to have constant variance. The variance of the y_i's is estimated from the data and can take any value greater than zero. In the Poisson model, the y_i's do not have constant variance. The variance of y_i is assumed to be equal to the expectation of y_i, where the expectation of y_i is estimated from the data.

For the Poisson model, the covariance matrix, and hence the standard errors of the parameter estimates, are estimated under the assumption that the Poisson model is appropriate. Occasionally we may observe more variation in the response than what is expected by the Poisson assumption. This is called overdispersion and means that the estimates of the standard errors of the parameters will not be correct. Overdispersion typically occurs when the observations are correlated. Underdispersion (less variation than expected) is also possible, although not as common.

You can identify possible overdispersion by dividing the deviance by its degrees of freedom (called the dispersion parameter). If the deviance is equal to the df (scale parameter=1) then there is no evidence of overdispersion. Note that a scale parameter not equal to one does not necessarily mean overdispersion. This can also indicate other problems, such as an incorrectly specified model or outliers in your data. An incorrectly specified model can be due to an incorrectly specified functional form (an additive rather than a multiplicative model may be appropriate) or, more likely, that important explanatory variables (or interactions) are missing from your model.

In most cases, lack of fit (identified by deviance > df) is due to missing explanatory variables (or interactions) from the model.

If you believe you have a correctly specified model, and the deviance is greater than the df, then you conclude that your data are overdispersed. You should be able to identify a reason why your data are overdispersed. If you don't correct for the overdispersion, then inference will be biased due to underestimated standard errors.

There are a variety of ways of correcting for the overdispersion, one of the simplest being to scale the covariance matrix by a constant. That is, instead of Var(y_i)=u_i, we assume Var(y_i)=phi * u_i, where phi is greater than 1 for an overdispersed model. The scale parameter (phi) can be estimated by the square root of the deviance divided by the df, which can be done in GENMOD by specifying DSCALE as an option to the model statement.

Any good book on GLMs will include a discussion on overdispersion and how to identify and adjust for it. See <http://www.maths.uq.edu.au/~gks/research/glm/books.html> for a list of references. David Collett gives a very good general overview (non-mathematical) of overdispersion and methods of adjusting for it for case of binomial outcomes in his book 'Modelling Binary Data' (Chapman and Hall 1993).

Paul Dickman --- Paul Dickman, Paul.Dickman@onkpat.ki.se Cancer Epidemiology Unit, Radiumhemmet, Karolinska Hospital, 171 76 Stockholm, Sweden Ph: +46 8 5177 5375 Fax: +46 8 326 113

At 10.29 1998-05-14 +0200, you wrote: >Hi SAS-Lers, > >We are using this proc to build a multiplicative model based on a set of = >variables <parameters> (e.g. gender, marital status etc.). >In the output of PROC GENMOD there is an intercept and a set of relative = >factors for each value of each parameter. The intercept represents the = >observed frequency and each of the relative factors is used to adjust = >this frequency according to the specific combination of parameter values = >in a particular observation. >i.e. Frequency =3D Intercept * Relative Factor for Gender * Relative = >Factor for Marital Status * ......=20 > >In the output, however, there is a 'parameter' called SCALE which is = >automatically output (in the same way as the intercept is output = >automatically). > >What does this scale value represent and how should we allow for it in = >our multiplicative model? Should we allow for it at all? > >Any help would be greatly appreciated > >Martin > >Martint@hollard.co.za > >

Back to: Top of message | Previous page | Main SAS-L page