| Date: | Fri, 26 Jun 2009 10:28:17 -0500 |
| Reply-To: | Robin R High <rhigh@UNMC.EDU> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Robin R High <rhigh@UNMC.EDU> |
| Subject: | Re: Pair mean test for binary variables with unequal sample sizes |
|
| In-Reply-To: | <6eca73440906251430s29553ac3lf476f98b46f9d44a@mail.gmail.com> |
| Content-Type: | text/plain; charset="US-ASCII" |
|---|
If i understand your problem correctly, you'd like something analogous in
one step with multiple levels of a class variable that PROC FREQ only does
with two ot the n levels, such as:
data one;
class='1'; do id=1 to 12; Y=(ranuni(92629)>.2); output; end;
class='2'; do id=21 to 38; Y=(ranuni(0)>.5); output; end;
class='3'; do id=41 to 55; Y=(ranuni(0)>.75); output; end;
run;
* compare levels 1 and 3;
ods select crosstabfreqs RiskDiffCol2;
proc freq data=one;
where class IN ('1', '3');
table class*y / nocol nopercent riskdiff ;
exact riskdiff;
run;
Table of class by Y
class Y
Frequency|
Row Pct | 0| 1| Total
---------+--------+--------+
1 | 3 | 9 | 12
| 25.00 | 75.00 |
---------+--------+--------+
3 | 11 | 4 | 15
| 73.33 | 26.67 |
---------+--------+--------+
Total 14 13 27
Column 2 Risk Estimates
(Asymptotic) 95% (Exact) 95%
Risk ASE Confidence Limits Confidence
Limits
-----------------------------------------------------------------------------
Row 1 0.7500 0.1250 0.5050 0.9950 0.4281 0.9451
Row 2 0.2667 0.1142 0.0429 0.4905 0.0779 0.5510
Total 0.4815 0.0962 0.2930 0.6699 0.2867 0.6805
Difference 0.4833 0.1693 0.1515 0.8152 0.0963 0.7676
Difference is (Row 1 - Row 2)
The ASE column for rows 1 and 2 is computed as SQRT(risk)*(1-risk)/n_row)
and the ASE of the difference is then SQRT(ASE_row1**2 + ase_row2**2) and
the corresponding 95% CI as diff +- 1.96(ASE_diff).
In PROCS that do this for multiple levels, the differences in the LSMEANS
are on the logit scale, and not the actual proportions, so you can
automate what PROC FREQ does with GLIMMIX and the ilink option, which
gives you the proportions and ASE's based on the sample size on the
LSMEANS table:
ods output lsmeans=lsm diffs=dfs;
ods listing close;
proc glimmix data=one noitprint noclprint;
class class;
model y(desc) = class / dist=binary;
lsmeans class / diff ilink;
run;
ods listing;
proc print data=lsm; run;
proc print data=dfs; run;
*Transpose and merge the means and standard errors together;
proc transpose data=lsm out=prc (drop=_name_ _label_) prefix=_cls;
var mu;
id class;
proc print; run;
proc transpose data=lsm out=stderr (drop=_name_ _label_) prefix=_std;
var stderrMU;
id class;
proc print; run;
* now you can compute the differences in the proportions, pvalues, and
confidence intervals of the differences;
DATA tst;
set prc ; SET stderr; DROP _cl: _st: ;
array prc{3} _cls: ;
array ste{3} _std: ;
do i = 1 to 2;
do j= i+1 to 3;
p1 = prc{i}; p2= prc{j};
diff = prc{i} - prc{j};
stderr = SQRT(ste{i}**2 + ste{j}**2);
low_diff = diff - 1.96*stderr;
upr_diff = diff + 1.96*stderr;
z = diff/stderr;
probZ = 2*(1-PROBNORM(ABS(z)));
OUTPUT;
END;
END;
proc print data=tst NOObs; run;
i j p1 p2 diff stderr low_diff upr_diff
z probZ
1 2 0.75000 0.44444 0.30555 0.17130 -0.03019 0.64130
1.78378 0.07446
1 3 0.75000 0.26667 0.48333 0.16930 0.15151 0.81516
2.85491 0.00430
2 3 0.44444 0.26667 0.17778 0.16357 -0.14282 0.49837
1.08687 0.27709
You'll see that comparing class=1 and class=3 (row 2 here) gives the same
results as from PROC FREQ running those two levels (which has exact
intervals included as well); also assumes the ASE's are based on "large"
enough sample sizes. And with any multiple comparison tests, you should
adjust the pvalues accordingly (say thru PROC MULTTEST).
Robin High
UNMC
OR Stats <stats112@GMAIL.COM>
Sent by: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
06/25/2009 04:43 PM
Please respond to
OR Stats <stats112@GMAIL.COM>
To
SAS-L@LISTSERV.UGA.EDU
cc
Subject
Pair mean test for binary variables with unequal sample sizes
Hello:
I have a data set that is
id Y Class
where id is unique id for the # of records that I have; y is binary taking
value of 0 or 1; and Class is the class that the observation belongs to. I
have n (n>2) classes, where one of my classes is Control against which I
want to compare the mean of every other class.
Since Y is binary, it is essentially a proportions test. However, I have
different number of observations for each class so that I would like to do
a
sample size difference correction for each comparison test.
Can I do this elegantly in one PROC or step?
Thank you!!
|