 wei yi <wy78712@YAHOO.COM> wrote:
> I have a data set, for each policy, I have # of claims
> (C), the premium of policy (P), now I define the
> frequency is X=C*1000/P; Could you tell me how I get
> the distribution of X in SAS? how I can test this
> variable is distributed as poisson?
>
> Thanks in advance.
>
> Vivian
>
Vivian,
I'm sorry, but I don't see how the variable X  which is defined as
the ratio C/P (times 1000)  could possibly be distributed as a
Poisson. I believe that what you want to examine is whether the
number of claims is distributed Poisson, conditional on the value
of the premium policy and assuming that each premium dollar produces
an equal increase in claim frequency. Is that correct?
For such a model, you need to use C as the response and use log(P)
(or, log(P/1000)) as an offset term. The code below demonstrates
how one would fit the appropriate model and test whether the response
was distributed Poisson. I generate some data which are Poisson
distributed, but the expectation depends on premium amount.
/* Generate data ~ Poisson(P*Lambda). Lamda */
/* is the expectation per premium dollar. */
data test;
do i=1 to 2500;
premium = 1250 + 250*rannor(1234579);
log_premium = log(premium/1000);
eta = 1.5;
lamda = exp(eta);
mu = lambda*(premium/1000);
y = ranpoi(1234579,mu);
output;
end;
keep premium y log_premium;
run;
/* Marginal distribution of number of claims */
proc freq data=test;
tables y;
run;
/* Distribution of premium amounts */
proc univariate data=test plot;
var premium;
run;
/* Fit Poisson model with expectation P*lambda */
/* Must use options OFFSET=log_P and log_P */
/* must exist on the data set (see above). */
proc genmod data=test;
model y = / dist=poisson offset=log_premium;
output out=predicted pred=phat;
run;
/* Now compute the expected frequencies of */
/* 0, 1, 2, ..., 10+ claims. Whether one uses */
/* these particular claim count categories */
/* really depends on the distribution of the */
/* number of claims. What is desired is */
/* approximately 10 good sized categories of */
/* claim counts. Since the distribution of Y */
/* shown above has counts of 0, 1, 2, ..., 10+, */
/* those are the levels which I sum over below. */
data _null_;
set predicted end=lastrec;
array expected {11} expect0expect10; /* 11 categories here */
do j=1 to 10;
i = j1; /* cumulative sum P(Y=i) */
expected{j} + pdf('Poisson', i, phat); /* i=0,1,...,9 */
end;
/* When we get to the last record, the sum of the expected */
/* probabilities in each of the first 10 levels is the */
/* expected count for the first 10 levels. Now we need to */
/* compute the expected count for the last level. We also */
/* output the expected counts to some macro variables that */
/* can be referenced by the FREQ procedure as the expected */
/* frequencies. We can obtain a chisquare test of whether*/
/* the observed count matches the expected counts. */
if lastrec then do;
do i=1 to 10;
expected{i} = expected{i};
sum_expected + expected{i};
call symput(compress("expected"put(i1,1.)),
put(expected{i},10.2));
end;
expected{11} = _n_  sum_expected;
call symput("expected10", put(expected{11},10.2));
end;
run;
/* Use a format so that our tabled Y values take on */
/* values of 0, 1, 2, ..., 9, 10+. We do not want */
/* more observed categories than there are expected */
/* values from our previous data step. */
proc format;
value trunc_y
10high = '10+';
run;
/* Obtain the chisquare test examining whether the */
/* observed counts follow the Poisson expected counts. */
proc freq data=test;
tables y / testf=(&expected0, &expected1, &expected2, &expected3,
&expected4, &expected5, &expected6, &expected7,
&expected8, &expected9, &expected10);
format y trunc_y.;
run;
HTH,
Dale

Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 6672926
Fax: (206) 6675977

____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs
