```Date: Tue, 17 Feb 2009 12:15:36 -0800 Reply-To: Steve Denham Sender: "SAS(r) Discussion" From: Steve Denham Subject: Re: How can caculate the chi-Square in SAS Comments: To: Joe Matise Content-Type: text/plain; charset=iso-8859-1 Is the cross-tab that comes from the following code at all what you are looking for? proc freq data=testdata; where _type_=3; table grouptype1*grouptype2/all; weight _freq_; run; You can break this down into specific comparisons by reducing the dataset appropriately. If you really want comparisons within each level of grouptype1, then the following should work: data testdata_red; set testdata; if _type_=3; run; proc sort data=testdata_red; by grouptype1; run; proc freq data=testdata_red; by grouptype1; tables grouptype2/all; weight _freq_; run; The output gives a 3 df chi squared test of equality of proportions within grouptype2 for each level of grouptype1. If there is an overall significance, you could proceed to test specified comparisons within a level, applying some adjustment (Bonferroni, for example) for multiple comparisons. Steve Denham Associate Director, Biostatistics MPI Research, Inc. ----- Original Message ---- From: Joe Matise To: SAS-L@LISTSERV.UGA.EDU Sent: Tuesday, February 17, 2009 2:15:34 PM Subject: Re: How can caculate the chi-Square in SAS I looked at the formula again, and realized that I do indeed use probchi in that manner at the end of the test... got lost with all of the rest of the calculations :) It looks like probchi is just doing the table lookup, though; while that is of course useful (not recreating the entire p-value table definitely simplifies things...) it doesn't really solve my problem, which I suspect I didn't explain very well (forgetting the 'always include sample data'... too long a day already apparently). The dataset that I'm operating from would be: data testset; do _n_ = 1 to 5000; grouptype1 = int(ranuni(7)*5); grouptype2 = int(ranuni(7)*4); q1 = int(ranuni(7)*2); q2 = int(ranuni(7)*2); weight = ranuni(7)*2; output; end; run; proc means data=testset noprint; weight weight; class grouptype1 grouptype2; types () grouptype1 grouptype2 grouptype1*grouptype2 ; var q1 q2 weight; output out=testdata mean(q1)=q1score mean(q2)=q2score sum(weight)=sumsqwt sumwgt(q1)=sumwt; run; data testdata_fb; set testdata; eff_base = (sumwt)**2/sumsqwt; run; testset is my respondent-level dataset, which as I understand it I could do a chi square test from directly; but I'd rather not (it's quite large, and I'm doing the rest of this work anyway). Testdata and testdata_fb are post-proc means, and are what I have to work from preferably. I use the effective base size (eff_base) instead of the n due to my weights. (There's actually an eff_base for every single question, since there are missing values for unanswered questions, but this seems easier to show with the example data.) I want to caculate, at every level below _type_=1, whether there is a significant difference between (level) and _type_=1, except that I need to exclude the effect of (level) from _type_=1's effect [so, really test whether, say, grouptype1=0 is different from grouptype1=(1,2,3)] . I could either work with directly comparing _type_=1 to each level, or comparing each _type_ separately (so, compare grouptype1=0 to grouptype=(1,2,3) directly). Perhaps it makes the most sense to keep doing this in a data step and just calculate the math myself (ie, [o-e]**2/e for the 4 cell values of o and e)... I'm just curious if there is a more direct way using (presumably) proc freq or similar. Every time I try to implement something similar to what .null and Mary suggested earlier, I cannot convince it to give me a separate value comparing (national) to (each value of each class variable) without doing way more manipulation than just doing the formula directly. Thanks! -Joe On Tue, Feb 17, 2009 at 11:41 AM, Nordlund, Dan (DSHS/RDA) < NordlDJ@dshs.wa.gov> wrote: > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > > Behalf Of Joe Matise > > Sent: Tuesday, February 17, 2009 8:18 AM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: Re: How can caculate the chi-Square in SAS > > > > I'll jump into this discussion on the 'asking' side ... :) > > > > I am using Chi-Square to test reporting level scores against > > national scores > > (Chi-Square was the direction of the client, so no, I can't > > use T-test or > > others). I have already generated the summary score dataset > > (output from > > PROC MEANS), and appended national scores to the dataset. > > Currently, I > > process this via direct math - ie, I wrote out a macro to do, > > by hand, the > > chi square test (using effective base size and percentage as > > inputs for each > > of the four cells). I imagine the code would be much easier > > to read if I > > can do it not by hand (ie, using PROBCHI or a PROC on the > > full dataset). > > > > I am limited in this largely by my lack of statistical background - I > > understand basic statistics to some extent, but college was a > > long time ago, > > and I'm really more of a programmer than a statistician :) > > PROBCHI takes as > > input the score, the DF (which I can derive from the > > effective base size, I > > believe, something like (FB-2); or does it involve both effective base > > sizes?) and the 'non-centrality' parameter. Am I correct in > > guessing that > > the NC parameter is equivalent to the 'benchmark' score that > > I'm comparing > > it to, or is that not relevant? Also, if I do it that way, > > it sounds like > > the effective base size for the overall group does not > > matter- that feels > > wrong to me given the formula I use, but perhaps it doesn't > > actually matter? > > > > I've also looked at the PROC FREQ options for chi square > > tests, but those > > seem to be roughly the same, and require nonsummarized data > > to compare to, > > which I'd prefer not to do (summarizing this takes hours, and > > there are a > > lot of levels, which PROC FREQ doesn't deal well with, as > > opposed to using > > CLASS)... > > > > I guess my ultimate question is, is it best to just use the > > directly written > > formula still, or is there a superior way using a built in formula? > > > > Thanks! > > > > -Joe > > > > My data, by the way, roughly looks like this: > > > > level1 , level2 , level3 , score1, effbase1, score2 , effbase2 > > ,,,.80,150,.70,100 > > ,,A,.70,50,.75,40 > > ,,B,.90,50,.77,30 > > ,,C,.80,50,.60,30 > > ,1,A,.75,20,.80,15 > > ... etcetera > > which I then appended the first row (the overall numbers) scores and > > effective base sizes to every row below it, to get the > > comparison values. > > > > Joe, > > I will jump into this on the answering side (sort of). :-) I don't > understand yet what the levels, score1, score2, effbase1 and effbase2 > represent yet. If you want to show your formula, I can provide further > comment on your calculation of chisq. > > However, given a chisq value you can use probchi to get a p-value. There > is no need to specify the non-centrality parameter. You mention 4 cells, so > it sounds like you calculating a chisq for a 2 by 2 table, and therefore > degrees of freedom would be 1. > > p = 1 - probchi(your_chi,1); > > Hope this is helpful, > > Dan > > Daniel J. Nordlund > Washington State Department of Social and Health Services > Planning, Performance, and Accountability > Research and Data Analysis Division > Olympia, WA 98504-5204 > ```

Back to: Top of message | Previous page | Main SAS-L page