LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 17 Feb 2009 12:15:36 -0800
Reply-To:     Steve Denham <stevedrd@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Steve Denham <stevedrd@YAHOO.COM>
Subject:      Re: How can caculate the chi-Square in SAS
Comments: To: Joe Matise <snoopy369@GMAIL.COM>
Content-Type: text/plain; charset=iso-8859-1

Is the cross-tab that comes from the following code at all what you are looking for?

proc freq data=testdata; where _type_=3; table grouptype1*grouptype2/all; weight _freq_; run;

You can break this down into specific comparisons by reducing the dataset appropriately.

If you really want comparisons within each level of grouptype1, then the following should work:

data testdata_red; set testdata; if _type_=3; run;

proc sort data=testdata_red; by grouptype1; run;

proc freq data=testdata_red; by grouptype1; tables grouptype2/all; weight _freq_; run;

The output gives a 3 df chi squared test of equality of proportions within grouptype2 for each level of grouptype1. If there is an overall significance, you could proceed to test specified comparisons within a level, applying some adjustment (Bonferroni, for example) for multiple comparisons.

Steve Denham Associate Director, Biostatistics MPI Research, Inc.

----- Original Message ---- From: Joe Matise <snoopy369@GMAIL.COM> To: SAS-L@LISTSERV.UGA.EDU Sent: Tuesday, February 17, 2009 2:15:34 PM Subject: Re: How can caculate the chi-Square in SAS

I looked at the formula again, and realized that I do indeed use probchi in that manner at the end of the test... got lost with all of the rest of the calculations :) It looks like probchi is just doing the table lookup, though; while that is of course useful (not recreating the entire p-value table definitely simplifies things...) it doesn't really solve my problem, which I suspect I didn't explain very well (forgetting the 'always include sample data'... too long a day already apparently).

The dataset that I'm operating from would be:

data testset; do _n_ = 1 to 5000; grouptype1 = int(ranuni(7)*5); grouptype2 = int(ranuni(7)*4); q1 = int(ranuni(7)*2); q2 = int(ranuni(7)*2); weight = ranuni(7)*2; output; end; run;

proc means data=testset noprint; weight weight; class grouptype1 grouptype2; types () grouptype1 grouptype2 grouptype1*grouptype2 ;

var q1 q2 weight; output out=testdata mean(q1)=q1score mean(q2)=q2score sum(weight)=sumsqwt sumwgt(q1)=sumwt; run;

data testdata_fb; set testdata; eff_base = (sumwt)**2/sumsqwt; run;

testset is my respondent-level dataset, which as I understand it I could do a chi square test from directly; but I'd rather not (it's quite large, and I'm doing the rest of this work anyway). Testdata and testdata_fb are post-proc means, and are what I have to work from preferably. I use the effective base size (eff_base) instead of the n due to my weights. (There's actually an eff_base for every single question, since there are missing values for unanswered questions, but this seems easier to show with the example data.)

I want to caculate, at every level below _type_=1, whether there is a significant difference between (level) and _type_=1, except that I need to exclude the effect of (level) from _type_=1's effect [so, really test whether, say, grouptype1=0 is different from grouptype1=(1,2,3)] . I could either work with directly comparing _type_=1 to each level, or comparing each _type_ separately (so, compare grouptype1=0 to grouptype=(1,2,3) directly).

Perhaps it makes the most sense to keep doing this in a data step and just calculate the math myself (ie, [o-e]**2/e for the 4 cell values of o and e)... I'm just curious if there is a more direct way using (presumably) proc freq or similar. Every time I try to implement something similar to what .null and Mary suggested earlier, I cannot convince it to give me a separate value comparing (national) to (each value of each class variable) without doing way more manipulation than just doing the formula directly.

Thanks!

-Joe

On Tue, Feb 17, 2009 at 11:41 AM, Nordlund, Dan (DSHS/RDA) < NordlDJ@dshs.wa.gov> wrote:

> > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > > Behalf Of Joe Matise > > Sent: Tuesday, February 17, 2009 8:18 AM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: Re: How can caculate the chi-Square in SAS > > > > I'll jump into this discussion on the 'asking' side ... :) > > > > I am using Chi-Square to test reporting level scores against > > national scores > > (Chi-Square was the direction of the client, so no, I can't > > use T-test or > > others). I have already generated the summary score dataset > > (output from > > PROC MEANS), and appended national scores to the dataset. > > Currently, I > > process this via direct math - ie, I wrote out a macro to do, > > by hand, the > > chi square test (using effective base size and percentage as > > inputs for each > > of the four cells). I imagine the code would be much easier > > to read if I > > can do it not by hand (ie, using PROBCHI or a PROC on the > > full dataset). > > > > I am limited in this largely by my lack of statistical background - I > > understand basic statistics to some extent, but college was a > > long time ago, > > and I'm really more of a programmer than a statistician :) > > PROBCHI takes as > > input the score, the DF (which I can derive from the > > effective base size, I > > believe, something like (FB-2); or does it involve both effective base > > sizes?) and the 'non-centrality' parameter. Am I correct in > > guessing that > > the NC parameter is equivalent to the 'benchmark' score that > > I'm comparing > > it to, or is that not relevant? Also, if I do it that way, > > it sounds like > > the effective base size for the overall group does not > > matter- that feels > > wrong to me given the formula I use, but perhaps it doesn't > > actually matter? > > > > I've also looked at the PROC FREQ options for chi square > > tests, but those > > seem to be roughly the same, and require nonsummarized data > > to compare to, > > which I'd prefer not to do (summarizing this takes hours, and > > there are a > > lot of levels, which PROC FREQ doesn't deal well with, as > > opposed to using > > CLASS)... > > > > I guess my ultimate question is, is it best to just use the > > directly written > > formula still, or is there a superior way using a built in formula? > > > > Thanks! > > > > -Joe > > > > My data, by the way, roughly looks like this: > > > > level1 , level2 , level3 , score1, effbase1, score2 , effbase2 > > ,,,.80,150,.70,100 > > ,,A,.70,50,.75,40 > > ,,B,.90,50,.77,30 > > ,,C,.80,50,.60,30 > > ,1,A,.75,20,.80,15 > > ... etcetera > > which I then appended the first row (the overall numbers) scores and > > effective base sizes to every row below it, to get the > > comparison values. > > > > Joe, > > I will jump into this on the answering side (sort of). :-) I don't > understand yet what the levels, score1, score2, effbase1 and effbase2 > represent yet. If you want to show your formula, I can provide further > comment on your calculation of chisq. > > However, given a chisq value you can use probchi to get a p-value. There > is no need to specify the non-centrality parameter. You mention 4 cells, so > it sounds like you calculating a chisq for a 2 by 2 table, and therefore > degrees of freedom would be 1. > > p = 1 - probchi(your_chi,1); > > Hope this is helpful, > > Dan > > Daniel J. Nordlund > Washington State Department of Social and Health Services > Planning, Performance, and Accountability > Research and Data Analysis Division > Olympia, WA 98504-5204 >


Back to: Top of message | Previous page | Main SAS-L page