Date: Tue, 15 Aug 2000 17:12:07 0400
ReplyTo: Chris Smith <cpsmith@AGFINANCE.COM>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Chris Smith <cpsmith@AGFINANCE.COM>
Subject: Re: Chisquare question
On Fri, 11 Aug 2000 10:49:16 +0200, Victor Bos <vic@TIK.NU> wrote:
>Hi group,
>
>At last: after 9+ years as a SAS developer, I am running into a statistical
>question.
>I am comparing two sets of data, containing customer information, and I
want
>to see if there is a significant difference between certain elements in the
>two datasets.
>I remember from my collegeyears, that to do that a Chisquare test can be
>used, so I startup proc freq with the CHISQ option to do the job. Now my
>problem:
>
>How do I interpret the results from proc freq?
>
>The documentation on proc freq is very limited on this. Could someone of
you
>statisticians please explain how I achieve my goal with proc freq/chisq??
Or
>can I better use another procedure or statistical test? For completeness, I
>have attached my proc freq output, in which I would like to decide whether
>there is a significant difference in my two testsets (TC=0 and TC=1) for
>each value of BC.
>
>can anyone help me out here??
>thanks in advance,
>
>Victor Bos
>Talkline Nederland BV,
>the Netherlands.
>
> TABLE OF BC BY TC
>
> BC TC
>
> Frequency
> Col Pct  0 1 Total
> +++
>   1  0  1
>  0.00  0.00 
> +++
> 1  7934  237  8171
>  10.08  9.13 
> +++
> 2  8558  243  8801
>  10.87  9.36 
> +++
> 3  8124  202  8326
>  10.32  7.78 
> +++
> 4  27383  850  28233
>  34.80  32.73 
> +++
> 5  8089  206  8295
>  10.28  7.93 
> +++
> 6  8890  411  9301
>  11.30  15.83 
> +++
> 7  9716  448  10164
>  12.35  17.25 
> +++
> Total 78695 2597 81292
>
>
> STATISTICS FOR TABLE OF BC BY TC
>
> Statistic DF Value Prob
> 
> ChiSquare 7 133.665 0.001
> Likelihood Ratio ChiSquare 7 126.890 0.001
> MantelHaenszel ChiSquare 1 71.981 0.001
> Phi Coefficient 0.041
> Contingency Coefficient 0.041
> Cramer's V 0.041
>
> Sample Size = 81292
First, you need to declare the missing values. To interpret the
crosstabulation you should also request the expected cell counts. This
should help you to visualize where the differences are. Remember though,
that the entire relationship between the observed and expected counts is
what is significant or not.
Also, you might consider taking a sample of this data. Chi Square loses a
great deal of power with such large populations, such that it is almost
always highly significant. A sample of a thousand or two should be
sufficent. Or consider a different test procedure that is not affected by
large samples.
Hope this helps you to blind them with science.
