Date: Fri, 26 Jun 2009 10:28:17 -0500 Robin R High "SAS(r) Discussion" Robin R High Re: Pair mean test for binary variables with unequal sample sizes To: OR Stats <6eca73440906251430s29553ac3lf476f98b46f9d44a@mail.gmail.com> text/plain; charset="US-ASCII"

If i understand your problem correctly, you'd like something analogous in one step with multiple levels of a class variable that PROC FREQ only does with two ot the n levels, such as:

data one; class='1'; do id=1 to 12; Y=(ranuni(92629)>.2); output; end; class='2'; do id=21 to 38; Y=(ranuni(0)>.5); output; end; class='3'; do id=41 to 55; Y=(ranuni(0)>.75); output; end; run;

* compare levels 1 and 3;

ods select crosstabfreqs RiskDiffCol2;

proc freq data=one; where class IN ('1', '3'); table class*y / nocol nopercent riskdiff ; exact riskdiff; run;

Table of class by Y

class Y

Frequency| Row Pct | 0| 1| Total ---------+--------+--------+ 1 | 3 | 9 | 12 | 25.00 | 75.00 | ---------+--------+--------+ 3 | 11 | 4 | 15 | 73.33 | 26.67 | ---------+--------+--------+ Total 14 13 27

Column 2 Risk Estimates

(Asymptotic) 95% (Exact) 95% Risk ASE Confidence Limits Confidence Limits ----------------------------------------------------------------------------- Row 1 0.7500 0.1250 0.5050 0.9950 0.4281 0.9451 Row 2 0.2667 0.1142 0.0429 0.4905 0.0779 0.5510 Total 0.4815 0.0962 0.2930 0.6699 0.2867 0.6805

Difference 0.4833 0.1693 0.1515 0.8152 0.0963 0.7676

Difference is (Row 1 - Row 2)

The ASE column for rows 1 and 2 is computed as SQRT(risk)*(1-risk)/n_row)

and the ASE of the difference is then SQRT(ASE_row1**2 + ase_row2**2) and the corresponding 95% CI as diff +- 1.96(ASE_diff).

In PROCS that do this for multiple levels, the differences in the LSMEANS are on the logit scale, and not the actual proportions, so you can automate what PROC FREQ does with GLIMMIX and the ilink option, which gives you the proportions and ASE's based on the sample size on the LSMEANS table:

ods output lsmeans=lsm diffs=dfs; ods listing close; proc glimmix data=one noitprint noclprint; class class; model y(desc) = class / dist=binary; lsmeans class / diff ilink; run;

ods listing; proc print data=lsm; run; proc print data=dfs; run;

*Transpose and merge the means and standard errors together;

proc transpose data=lsm out=prc (drop=_name_ _label_) prefix=_cls; var mu; id class;

proc print; run;

proc transpose data=lsm out=stderr (drop=_name_ _label_) prefix=_std; var stderrMU; id class;

proc print; run;

* now you can compute the differences in the proportions, pvalues, and confidence intervals of the differences;

DATA tst; set prc ; SET stderr; DROP _cl: _st: ; array prc{3} _cls: ; array ste{3} _std: ; do i = 1 to 2; do j= i+1 to 3; p1 = prc{i}; p2= prc{j}; diff = prc{i} - prc{j}; stderr = SQRT(ste{i}**2 + ste{j}**2); low_diff = diff - 1.96*stderr; upr_diff = diff + 1.96*stderr; z = diff/stderr; probZ = 2*(1-PROBNORM(ABS(z))); OUTPUT; END; END;

proc print data=tst NOObs; run;

i j p1 p2 diff stderr low_diff upr_diff z probZ

1 2 0.75000 0.44444 0.30555 0.17130 -0.03019 0.64130 1.78378 0.07446 1 3 0.75000 0.26667 0.48333 0.16930 0.15151 0.81516 2.85491 0.00430 2 3 0.44444 0.26667 0.17778 0.16357 -0.14282 0.49837 1.08687 0.27709

You'll see that comparing class=1 and class=3 (row 2 here) gives the same results as from PROC FREQ running those two levels (which has exact intervals included as well); also assumes the ASE's are based on "large" enough sample sizes. And with any multiple comparison tests, you should adjust the pvalues accordingly (say thru PROC MULTTEST).

Robin High UNMC

OR Stats <stats112@GMAIL.COM> Sent by: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> 06/25/2009 04:43 PM Please respond to OR Stats <stats112@GMAIL.COM>

To SAS-L@LISTSERV.UGA.EDU cc

Subject Pair mean test for binary variables with unequal sample sizes

Hello:

I have a data set that is

id Y Class

where id is unique id for the # of records that I have; y is binary taking value of 0 or 1; and Class is the class that the observation belongs to. I have n (n>2) classes, where one of my classes is Control against which I want to compare the mean of every other class.

Since Y is binary, it is essentially a proportions test. However, I have different number of observations for each class so that I would like to do a sample size difference correction for each comparison test.

Can I do this elegantly in one PROC or step?

Thank you!!

Back to: Top of message | Previous page | Main SAS-L page