LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 8 Jan 2008 17:08:15 -0800
Reply-To:   Dale McLerran <stringplayer_2@YAHOO.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Dale McLerran <stringplayer_2@YAHOO.COM>
Subject:   Re: how to test that a variable is distributed as Poisson
In-Reply-To:   <558515.11775.qm@web34309.mail.mud.yahoo.com>
Content-Type:   text/plain; charset=iso-8859-1

--- wei yi <wy78712@YAHOO.COM> wrote:

> I have a data set, for each policy, I have # of claims > (C), the premium of policy (P), now I define the > frequency is X=C*1000/P; Could you tell me how I get > the distribution of X in SAS? how I can test this > variable is distributed as poisson? > > Thanks in advance. > > Vivian >

Vivian,

I'm sorry, but I don't see how the variable X - which is defined as the ratio C/P (times 1000) - could possibly be distributed as a Poisson. I believe that what you want to examine is whether the number of claims is distributed Poisson, conditional on the value of the premium policy and assuming that each premium dollar produces an equal increase in claim frequency. Is that correct?

For such a model, you need to use C as the response and use log(P) (or, log(P/1000)) as an offset term. The code below demonstrates how one would fit the appropriate model and test whether the response was distributed Poisson. I generate some data which are Poisson distributed, but the expectation depends on premium amount.

/* Generate data ~ Poisson(P*Lambda). Lamda */ /* is the expectation per premium dollar. */ data test; do i=1 to 2500; premium = 1250 + 250*rannor(1234579); log_premium = log(premium/1000); eta = 1.5; lamda = exp(eta); mu = lambda*(premium/1000); y = ranpoi(1234579,mu); output; end; keep premium y log_premium; run;

/* Marginal distribution of number of claims */ proc freq data=test; tables y; run;

/* Distribution of premium amounts */ proc univariate data=test plot; var premium; run;

/* Fit Poisson model with expectation P*lambda */ /* Must use options OFFSET=log_P and log_P */ /* must exist on the data set (see above). */ proc genmod data=test; model y = / dist=poisson offset=log_premium; output out=predicted pred=phat; run;

/* Now compute the expected frequencies of */ /* 0, 1, 2, ..., 10+ claims. Whether one uses */ /* these particular claim count categories */ /* really depends on the distribution of the */ /* number of claims. What is desired is */ /* approximately 10 good sized categories of */ /* claim counts. Since the distribution of Y */ /* shown above has counts of 0, 1, 2, ..., 10+, */ /* those are the levels which I sum over below. */ data _null_; set predicted end=lastrec; array expected {11} expect0-expect10; /* 11 categories here */ do j=1 to 10; i = j-1; /* cumulative sum P(Y=i) */ expected{j} + pdf('Poisson', i, phat); /* i=0,1,...,9 */ end; /* When we get to the last record, the sum of the expected */ /* probabilities in each of the first 10 levels is the */ /* expected count for the first 10 levels. Now we need to */ /* compute the expected count for the last level. We also */ /* output the expected counts to some macro variables that */ /* can be referenced by the FREQ procedure as the expected */ /* frequencies. We can obtain a chi-square test of whether*/ /* the observed count matches the expected counts. */ if lastrec then do; do i=1 to 10; expected{i} = expected{i}; sum_expected + expected{i}; call symput(compress("expected"||put(i-1,1.)), put(expected{i},10.2)); end; expected{11} = _n_ - sum_expected; call symput("expected10", put(expected{11},10.2)); end; run;

/* Use a format so that our tabled Y values take on */ /* values of 0, 1, 2, ..., 9, 10+. We do not want */ /* more observed categories than there are expected */ /* values from our previous data step. */ proc format; value trunc_y 10-high = '10+'; run;

/* Obtain the chi-square test examining whether the */ /* observed counts follow the Poisson expected counts. */ proc freq data=test; tables y / testf=(&expected0, &expected1, &expected2, &expected3, &expected4, &expected5, &expected6, &expected7, &expected8, &expected9, &expected10); format y trunc_y.; run;

HTH,

Dale

--------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs


Back to: Top of message | Previous page | Main SAS-L page