Date: Mon, 21 Jul 2003 10:48:51 -0700
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Odd Results with Proc Summary Missing Value assignment
In-Reply-To: <zpUSa.21333$zwL.1740@news04.bloor.is.net.cable.rogers.com>
Content-Type: text/plain; charset=us-ascii
Art,
I believe that you are muddled on the purpose of the weight
variable. You appear to be trying to follow a mathematical
statement/argument, but the truth of the matter is that the
weight variable is a statistical device. When you assign a
weight value of zero, this is understood to mean that there is
no informational content to the response which you are analyzing.
If a person has weight value zero for every response, then
there is no information present for computing a mean value.
If there is no information present, then the mean should be
zero. This is precisely what SAS returns, and every statistician
would be up in arms if SAS did otherwise.
It is entirely another matter to compute the mean of a product
of two variables, which is the argument that you are persuing
below. The product term must be computed in advance of
invocation of PROC MEANS/SUMMARY. In fact, if you compute
your own product term, your program will be much more compact
and will execute much faster. This bonus comes on top of
computation of the correct statistic. I demonstrate below
how you might code your problem to produce the desired result.
Since you indicate an insurance type problem, I have taken the
liberty to rename the variables in your original presentation
to indicate severity and frequency of claims. The total cost is
(according to your presentation), frequency times severity.
data one;
input id severity1 severity2 severity3 freq1 freq2 freq3;
cards ;
1 1 2 3 2 2 2
1 1 2 3 1 1 1
1 1 2 3 3 3 3
2 0 0 3 0 4 4
2 0 0 3 0 2 2
2 0 0 3 0 1 1
3 2 4 8 1 1 1
3 0 4 8 0 1 1
3 2 4 8 5 1 1
;
run;
data two / view=two;
set one;
tot1 = freq1*severity1;
tot2 = freq2*severity2;
tot3 = freq3*severity3;
keep id tot1-tot3;
run;
proc summary data=two;
by id;
var tot1-tot3;
output out=three (DROP=_TYPE_ _FREQ_) mean=m1-m3;
run;
proc print data=three;
run;
--- Arthur Tabachneck <atabachneck@ROGERS.COM> wrote:
> John,
>
> While I'm sure that Tim's logic closely resembles the decision rule
> that
> went into Proc Summary's design, I definitely don't agree with the
> default
> settings.
>
> The two most commonly used measures in the field of insurance are
> frequency
> (i.e., how often a claim occurs) and severity (i.e., the average cost
> of a
> claim). Everyone's contributing share to that pot is the product of
> frequency times severity (e.g., if 100 out of a thousand have claims
> which
> average $1,000 per claim, then we each have to put $100 in the pot to
> cover
> the total cost of the anticipated claims.
>
> Where no claims occur, the average cost of a claim is 0 and the sum
> of the
> losses is 0, definitely not 'missing.'
>
> Similarly, in medicine, I would be extremely interested in a
> treatment that
> never has any fatalities. In fact, I can already see the lawsuits
> coming if
> analysts were to discount such results because SAS said the values
> were
> missing.
>
> Art
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
|