Date: Fri, 16 Apr 2004 14:30:21 -0400
Reply-To: Jonas Bilenas <Jonas.Bilenas@CHASE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jonas Bilenas <Jonas.Bilenas@CHASE.COM>
Subject: Re: t-test proportions
You can also test for differences in proportions between 2 groups in PROC
LOGISTIC and GENMOD. Here is test data and output using LOGISTIC:
CODE:
data test;
input record event quantity;
datalines;
1 100 1000
2 129 1005
;;;
/* proc logistic */
PROC logistic data=test descending;
CLASS record/param=glm;
MODEL event/quantity=record ;
contrast '2 prop' record 1 -1;
run; quit;
OUTPUT:
The LOGISTIC Procedure
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 3.9942 1 0.0457
Score 3.9844 1 0.0459
Wald 3.9658 1 0.0464
Type III Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
record 1 3.9658 0.0464
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.9155 0.0943 412.5856 <.0001
record 1 1 -0.2817 0.1414 3.9658 0.0464
record 2 0 0 . . .
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
record 1 vs 2 0.755 0.572 0.996
Association of Predicted Probabilities and Observed Responses
Percent Concordant 28.5 Somers' D 0.070
Percent Discordant 21.5 Gamma 0.140
Percent Tied 49.9 Tau-a 0.014
Pairs 406704 c 0.535
The LOGISTIC Procedure
Contrast Test Results
Wald
Contrast DF Chi-Square Pr > ChiSq
2 prop 1 3.9658 0.0464
SIMILAR results are obtained if we used PROC FREQ. But that required an
additional DATA STEP:
CODE:
data test;
input record event quantity;
datalines;
1 100 1000
2 129 1005
;;;
/* to use proc freq */
data pf;
set test;
hit=1;
n=event;
output;
hit=0;
n=quantity-event;
output;
run;
proc freq data=pf;
tables rec*hit/chisq;
weight n;
run;
CHI-SQ OUTPUT:
The FREQ Procedure
Statistics for Table of rec by hit
Statistic DF Value Prob
Chi-Square 1 3.9844 0.0459
Likelihood Ratio Chi-Square 1 3.9942 0.0457
Continuity Adj. Chi-Square 1 3.7090 0.0541
Mantel-Haenszel Chi-Square 1 3.9824 0.0460
Phi Coefficient 0.0446
Contingency Coefficient 0.0445
Cramer's V 0.0446
On Thu, 15 Apr 2004 13:02:01 -0500, Paul R Swank <Paul.R.Swank@UTH.TMC.EDU>
wrote:
>Proc freq will do tests of homogeneity of proportions as chi-squares and
>will also do the McNemar test of correlated proportions.
>
>
>Paul R. Swank, Ph.D.
>Professor, Developmental Pediatrics
>Medical School
>UT Health Science Center at Houston
>
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Michael
>Whitcomb
>Sent: Thursday, April 15, 2004 11:20 AM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: t-test proportions
>
>
>Jim,
>
>As far as I know, no SAS PROC will directly test for differences in
>proportions. However, in v9 in the "Analyst" one can do this. If you are
>like me, however, using the analyst is cumbersome and really isn't an
option
>(plus, isn't it just a crappy SPSS rip-off, anyway?). Therefore we have
>depended on writing code for these tests. First I take the means of the
>binary data to get the proportion, store the PROC MEANS output into a SAS
>data file (either via ODS or out=) and then apply the formulas to the
output
>dataset. It's too bad SAS won't test for proportions directly, but once
you
>code the formulas, this solution isn't too bad...
>
>-Michael
>
>
>~~
>Michael Whitcomb
>Assistant Director of Institutional Research
>Wesleyan University
>
>
>
>At 11:16 AM 4/15/2004, Groeneveld, Jim wrote:
>>Thank you Matthew,
>>
>>But I do not want to test observed and expected cell frequencies, but
>>the more something similar to a paired t-test, but then for dichotome
>>data. For independent samples I already presented the formulas for a
>>t-test proportions, but I would like (either formulas or rather) a
>>straight way to perform such a test. Or do the additional statistics
>>with PROC FREQ indicate proportion differences?
>>
>>E.g. a 2x2 table could be filled as:
>> b1 b2 bt
>> a1 1 10 11
>> a2 2 20 22
>> at 3 30 33
>>where the distributions within the rows and columns are not different.
>>But I am interested in the significance of the difference between a1=11
>>and b1=3, or rather their equivalent proportions from the total of 33.
>>
>>Or am I missing or overlooking something?
>>
>>Regards - Jim.
>>--
>>. . . . . . . . . . . . . . . .
>>
>>Jim Groeneveld, MSc.
>>Biostatistician
>>Science Team
>>Vitatron B.V.
>>Meander 1051
>>6825 MJ Arnhem
>>Tel: +31/0 26 376 7365
>>Fax: +31/0 26 376 7305
>>Jim.Groeneveld@Vitatron.com
>>www.vitatron.com
>>
>>My computer remains home, but I will attend SUGI 2004.
>>
>>[common disclaimer]
>>
>>
>>-----Original Message-----
>>From: Zack, Matthew M. [mailto:mmz1@cdc.gov]
>>Sent: Thursday, April 15, 2004 15:18
>>To: Groeneveld, Jim
>>Subject: RE: t-test proportions
>>
>>
>>After a DATA step to calculate the number in each cell of a 2x2 table,
>>use PROC FREQ with its TABLE statement option, CHISQ, for test
>>statistics for independent proportions
>>and the TABLE statement option, AGREE, for McNemar's test for dependent
>>proportions:
>>
>> data table;
>> infile cards;
>> input prop n;
>> row=_n_;
>> col=1;
>> cell=int(prop*n);
>> output table;
>> col=2;
>> cell=n-cell;
>> output table;
>> label prop="Proportion";
>> label n="Total";
>> cards;
>> 0.20 92
>> 0.50 68
>> ;
>> run;
>>
>> proc freq data=table;
>> table row*col / chisq agree expected cellchi2;
>> freq cell;
>> run;
>>
>>I also use the PROC FREQ TABLE statement options, EXPECTED and
>>CELLCHI2, to identify table cells contributing most to the chi-squared
>>statistic because the expected value
>>of a table cell differs markedly from the observed number under the
>>hypothesis
>>of independence of the row and column proportions.
>>
>>Matthew Zack
>>
>>-----Original Message-----
>>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>>Groeneveld, Jim
>>Sent: Thursday, April 15, 2004 8:19 AM
>>To: SAS-L@LISTSERV.UGA.EDU
>>Subject: t-test proportions
>>
>>
>>Hi friends,
>>
>>I am looking for an implantation in SAS of the hypothesis tests
>>described below. My design is either two dependent (paired) or two
>>independent samples (groups). A single dichotome variable has to be
>>tested for differences between both groups. The difference can be
>>described in terms of proportions (of one of the two values) and group
>>sizes only.
>>
>>In the book <<Introduction to Statistical Analysis and Inference for
>>Psychology and Education, by Sidney J. Armore, 1970>> a t-test for
>>proportions between independent groups is outlined. Based on that I
>>wrote a simple Fortran (Fortran 4 or Fortran 66 as it was called on an
>>already extinct mainframe computer) program some 25 years ago, which
>>calculated z-scores from both proportions (or percentages) and group
>>sizes. The partial code, from which the used formula may be evident, is:
>> POOLED=(PROP1*N1+PROP2*N2)/(N1+N2)
>> ZSCORE=(PROP1-PROP2)/SQRT(POOLED*(1.-POOLED)*(1./N1+1./N2))
>>I have used this program quite some time with the aggregated data.
>>
>>While searching the internet I came across a.o. the following sites:
>>http://courses.smsu.edu/nkk661f/QBA337/handout4.htm
>>http://www.stat.sc.edu/curricula/courses/515/515SAS.html#9p3
>>Both pages give formulas for proportions, which actually are the same
>>in both of them. Their formula is: z = (p1 - p2) / sqrt ( (P x (1-P) /
>>n1 )
>>+ (P x (1-P) / n2 ) ) where P = pooled proportion: (p1n1 + p2n2) / (n1
>>+ +
>>n2) This is the same formula I used to use.
>>
>>The web page
>>http://www.ocair.org/files/KnowledgeBase/Statistics/Proportion.htm
>>mentiones a similar formula for t, where the pooled proportion is
>>replaced by the group proportions: t = (p1 - p2) / sqrt ( (p1 x (1-p1)
>>/ n1 ) + (p2 x (1-p2) / n2 ) )
>>
>>These sites apparently give code to calculate the p-values using data
>>step code, but now I would like to know how I can calculate the same
>>from the individual data using a standard SAS PROCedure. So I would
>>like to avoid writing some algorithm in a data step, because that would
>>have to be validated. I know I also could apply a Chi-square.
>>
>>And additional to that I also would like to know how to do it with a
>>standard SAS PROCedure with dependent (paired) groups (repeated
>>measures), i.e. comparing the proportions of two different dichotome
>>variables within one sample.
>>
>>Regards - Jim.
>>--
>>. . . . . . . . . . . . . . . .
>>
>>Jim Groeneveld, MSc.
>>Biostatistician
>>Science Team
>>Vitatron B.V.
>>Meander 1051
>>6825 MJ Arnhem
>>Tel: +31/0 26 376 7365
>>Fax: +31/0 26 376 7305
>>Jim.Groeneveld@Vitatron.com
>>www.vitatron.com
>>
>>My computer remains home, but I will attend SUGI 2004.
>>
>>[common disclaimer]