LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 29 Mar 2008 22:13:14 -0700
Reply-To:     Phil Holman <piholmanc@YOURSERVICE.UGA.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Phil Holman <piholmanc@YOURSERVICE.UGA.EDU>
Subject:      Re: help with thoughts about the chi square test of independence
Comments: To: sas-l@uga.edu

"jenmoocat" <sollje2002@yahoo.com> wrote in message news:6ec0cceb-9ffb-4487-aea5-6850694dfcda@h11g2000prf.googlegroups.com... > Hello all. > I hope some of you can shed some light on a problem I am having. > I do have a stats degree, but I got it over 15 years ago. > I've used the web to try to research this idea, but I don't really see > it addressed.... > > > I was tasked with creating an audit function. > Part of a process flow is to randomly assign customers to one of two > groups. > We want to make sure that the customers in group 1 look like the > customers in group 2. > I thought that a chi-square test of independence could be a way to do > this. > > I chose a couple of factors that define our customers: age, tenure, > risk-score (for example). > I then perform the chi-square test of independence on each factor > separately. > In each case, I am essentially posing the null hypothesis that the > factor is independent of group membership: > age is unrelated to group membership, tenure is independent of group > membership, etc... > In my thinking, if the null hypothesis is true along all of the > factors of importance, then the two groups have truly been populated > randomly. > > In the actual mechanics of the test, I have tens of thousands (if not > hundreds of thousands) of observations. > I then bin the factor --- break age down into 9 groups for example: > under 18 > 18 to 25 > 25 to 35 > etc.... > > In that way I then get two distributions: the distribution of group 1 > by age and the distribution of group 2 by age. > I have read in the statistics literature that, because the chi-square > test by nature is sensitive to sample size, the significance level of > such a test should be something like 0.01, rather than the more common > 0.05. > > So I perform my test on the independence of age and group membership. > I graph the two histograms together, so I can get a visual aid. And I > also calculate the chi-square statistic... > > And I have found that even small differences will cause the null > hypothesis to be rejected. > > In the data below, if you graph the two histograms together, they line > up very closely. > The data, eyeballed, looks as if age is independent from group > membership. > However, the calculated chi-square stat is 46, compared to the > critical value of 21 for 9 degrees of freedom and a significance level > of 0.01. The p-value is miniscule. I intrepret this to be the > probability of the calculated chi-square stat (or seeing these two > histograms) if the null hypothesis of independence were true, is very > tiny. > > age group 1 group 2 > 1 86 77 > 2 415 440 > 3 1,559 1,577 > 4 5,810 5,751 > 5 22,450 22,000 > 6 26,182 26,182 > 7 16,947 16,947 > 8 5,336 6,000 > 9 184 168 > 10 8 11 > > My bosses think that the test is not good at these high numbers and > are thinking about scrapping it.

What gives with age group 8? If it wasn't for that one age group, your X^2 value would be ~10.

Phil H


Back to: Top of message | Previous page | Main SAS-L page