LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2000, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 15 Aug 2000 17:12:07 -0400
Reply-To:     Chris Smith <cpsmith@AGFINANCE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Chris Smith <cpsmith@AGFINANCE.COM>
Subject:      Re: Chi-square question
Comments: To: vic@TIK.NU

On Fri, 11 Aug 2000 10:49:16 +0200, Victor Bos <vic@TIK.NU> wrote:

>Hi group, > >At last: after 9+ years as a SAS developer, I am running into a statistical >question. >I am comparing two sets of data, containing customer information, and I want >to see if there is a significant difference between certain elements in the >two datasets. >I remember from my college-years, that to do that a Chi-square test can be >used, so I startup proc freq with the CHISQ option to do the job. Now my >problem: > >How do I interpret the results from proc freq? > >The documentation on proc freq is very limited on this. Could someone of you >statisticians please explain how I achieve my goal with proc freq/chisq?? Or >can I better use another procedure or statistical test? For completeness, I >have attached my proc freq output, in which I would like to decide whether >there is a significant difference in my two testsets (TC=0 and TC=1) for >each value of BC. > >can anyone help me out here?? >thanks in advance, > >Victor Bos >Talkline Nederland BV, >the Netherlands. > > TABLE OF BC BY TC > > BC TC > > Frequency| > Col Pct | 0| 1| Total > ---------+--------+--------+ > - | 1 | 0 | 1 > | 0.00 | 0.00 | > ---------+--------+--------+ > 1 | 7934 | 237 | 8171 > | 10.08 | 9.13 | > ---------+--------+--------+ > 2 | 8558 | 243 | 8801 > | 10.87 | 9.36 | > ---------+--------+--------+ > 3 | 8124 | 202 | 8326 > | 10.32 | 7.78 | > ---------+--------+--------+ > 4 | 27383 | 850 | 28233 > | 34.80 | 32.73 | > ---------+--------+--------+ > 5 | 8089 | 206 | 8295 > | 10.28 | 7.93 | > ---------+--------+--------+ > 6 | 8890 | 411 | 9301 > | 11.30 | 15.83 | > ---------+--------+--------+ > 7 | 9716 | 448 | 10164 > | 12.35 | 17.25 | > ---------+--------+--------+ > Total 78695 2597 81292 > > > STATISTICS FOR TABLE OF BC BY TC > > Statistic DF Value Prob > ------------------------------------------------------ > Chi-Square 7 133.665 0.001 > Likelihood Ratio Chi-Square 7 126.890 0.001 > Mantel-Haenszel Chi-Square 1 71.981 0.001 > Phi Coefficient 0.041 > Contingency Coefficient 0.041 > Cramer's V 0.041 > > Sample Size = 81292

First, you need to declare the missing values. To interpret the crosstabulation you should also request the expected cell counts. This should help you to visualize where the differences are. Remember though, that the entire relationship between the observed and expected counts is what is significant or not.

Also, you might consider taking a sample of this data. Chi Square loses a great deal of power with such large populations, such that it is almost always highly significant. A sample of a thousand or two should be sufficent. Or consider a different test procedure that is not affected by large samples.

Hope this helps you to blind them with science.


Back to: Top of message | Previous page | Main SAS-L page