Date: Tue, 17 Feb 2009 12:15:36 -0800
Reply-To: Steve Denham <stevedrd@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Steve Denham <stevedrd@YAHOO.COM>
Subject: Re: How can caculate the chi-Square in SAS
Content-Type: text/plain; charset=iso-8859-1
Is the cross-tab that comes from the following code at all what you are looking for?
proc freq data=testdata;
where _type_=3;
table grouptype1*grouptype2/all;
weight _freq_;
run;
You can break this down into specific comparisons by reducing the dataset appropriately.
If you really want comparisons within each level of grouptype1, then the following should work:
data testdata_red;
set testdata;
if _type_=3;
run;
proc sort data=testdata_red;
by grouptype1;
run;
proc freq data=testdata_red;
by grouptype1;
tables grouptype2/all;
weight _freq_;
run;
The output gives a 3 df chi squared test of equality of proportions within grouptype2 for each level of grouptype1. If there is an overall significance, you could proceed to test specified comparisons within a level, applying some adjustment (Bonferroni, for example) for multiple comparisons.
Steve Denham
Associate Director, Biostatistics
MPI Research, Inc.
----- Original Message ----
From: Joe Matise <snoopy369@GMAIL.COM>
To: SAS-L@LISTSERV.UGA.EDU
Sent: Tuesday, February 17, 2009 2:15:34 PM
Subject: Re: How can caculate the chi-Square in SAS
I looked at the formula again, and realized that I do indeed use probchi in
that manner at the end of the test... got lost with all of the rest of the
calculations :) It looks like probchi is just doing the table lookup,
though; while that is of course useful (not recreating the entire p-value
table definitely simplifies things...) it doesn't really solve my problem,
which I suspect I didn't explain very well (forgetting the 'always include
sample data'... too long a day already apparently).
The dataset that I'm operating from would be:
data testset;
do _n_ = 1 to 5000;
grouptype1 = int(ranuni(7)*5);
grouptype2 = int(ranuni(7)*4);
q1 = int(ranuni(7)*2);
q2 = int(ranuni(7)*2);
weight = ranuni(7)*2;
output;
end;
run;
proc means data=testset noprint;
weight weight;
class grouptype1 grouptype2;
types
()
grouptype1
grouptype2
grouptype1*grouptype2
;
var q1 q2 weight;
output out=testdata
mean(q1)=q1score mean(q2)=q2score
sum(weight)=sumsqwt sumwgt(q1)=sumwt;
run;
data testdata_fb;
set testdata;
eff_base = (sumwt)**2/sumsqwt;
run;
testset is my respondent-level dataset, which as I understand it I could do
a chi square test from directly; but I'd rather not (it's quite large, and
I'm doing the rest of this work anyway). Testdata and testdata_fb are
post-proc means, and are what I have to work from preferably. I use the
effective base size (eff_base) instead of the n due to my weights. (There's
actually an eff_base for every single question, since there are missing
values for unanswered questions, but this seems easier to show with the
example data.)
I want to caculate, at every level below _type_=1, whether there is a
significant difference between (level) and _type_=1, except that I need to
exclude the effect of (level) from _type_=1's effect [so, really test
whether, say, grouptype1=0 is different from grouptype1=(1,2,3)] . I could
either work with directly comparing _type_=1 to each level, or comparing
each _type_ separately (so, compare grouptype1=0 to grouptype=(1,2,3)
directly).
Perhaps it makes the most sense to keep doing this in a data step and just
calculate the math myself (ie, [o-e]**2/e for the 4 cell values of o and
e)... I'm just curious if there is a more direct way using (presumably) proc
freq or similar. Every time I try to implement something similar to what
.null and Mary suggested earlier, I cannot convince it to give me a separate
value comparing (national) to (each value of each class variable) without
doing way more manipulation than just doing the formula directly.
Thanks!
-Joe
On Tue, Feb 17, 2009 at 11:41 AM, Nordlund, Dan (DSHS/RDA) <
NordlDJ@dshs.wa.gov> wrote:
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> > Behalf Of Joe Matise
> > Sent: Tuesday, February 17, 2009 8:18 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: How can caculate the chi-Square in SAS
> >
> > I'll jump into this discussion on the 'asking' side ... :)
> >
> > I am using Chi-Square to test reporting level scores against
> > national scores
> > (Chi-Square was the direction of the client, so no, I can't
> > use T-test or
> > others). I have already generated the summary score dataset
> > (output from
> > PROC MEANS), and appended national scores to the dataset.
> > Currently, I
> > process this via direct math - ie, I wrote out a macro to do,
> > by hand, the
> > chi square test (using effective base size and percentage as
> > inputs for each
> > of the four cells). I imagine the code would be much easier
> > to read if I
> > can do it not by hand (ie, using PROBCHI or a PROC on the
> > full dataset).
> >
> > I am limited in this largely by my lack of statistical background - I
> > understand basic statistics to some extent, but college was a
> > long time ago,
> > and I'm really more of a programmer than a statistician :)
> > PROBCHI takes as
> > input the score, the DF (which I can derive from the
> > effective base size, I
> > believe, something like (FB-2); or does it involve both effective base
> > sizes?) and the 'non-centrality' parameter. Am I correct in
> > guessing that
> > the NC parameter is equivalent to the 'benchmark' score that
> > I'm comparing
> > it to, or is that not relevant? Also, if I do it that way,
> > it sounds like
> > the effective base size for the overall group does not
> > matter- that feels
> > wrong to me given the formula I use, but perhaps it doesn't
> > actually matter?
> >
> > I've also looked at the PROC FREQ options for chi square
> > tests, but those
> > seem to be roughly the same, and require nonsummarized data
> > to compare to,
> > which I'd prefer not to do (summarizing this takes hours, and
> > there are a
> > lot of levels, which PROC FREQ doesn't deal well with, as
> > opposed to using
> > CLASS)...
> >
> > I guess my ultimate question is, is it best to just use the
> > directly written
> > formula still, or is there a superior way using a built in formula?
> >
> > Thanks!
> >
> > -Joe
> >
> > My data, by the way, roughly looks like this:
> >
> > level1 , level2 , level3 , score1, effbase1, score2 , effbase2
> > ,,,.80,150,.70,100
> > ,,A,.70,50,.75,40
> > ,,B,.90,50,.77,30
> > ,,C,.80,50,.60,30
> > ,1,A,.75,20,.80,15
> > ... etcetera
> > which I then appended the first row (the overall numbers) scores and
> > effective base sizes to every row below it, to get the
> > comparison values.
> >
>
> Joe,
>
> I will jump into this on the answering side (sort of). :-) I don't
> understand yet what the levels, score1, score2, effbase1 and effbase2
> represent yet. If you want to show your formula, I can provide further
> comment on your calculation of chisq.
>
> However, given a chisq value you can use probchi to get a p-value. There
> is no need to specify the non-centrality parameter. You mention 4 cells, so
> it sounds like you calculating a chisq for a 2 by 2 table, and therefore
> degrees of freedom would be 1.
>
> p = 1 - probchi(your_chi,1);
>
> Hope this is helpful,
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
>
|