Not sure I'm a greater mind but here goes:
(1) Simple stuff first: if you are doing t-tests, the
general formula
for the t-test is the following:
Obtained t=(M1 - M2)/sqrt[VarErr1 + VarErr2 -
2*r*SE1*SE2]
Where M1=mean group1, M2=mean group2,VarErr1=Variance error
group1,
VarErr2=Variance error group2, r=Pearson r between group1 and
group2
valaues, SE1=standard error group1, SE2=standard error group2,
and
2=constant (the number 2).
If you cannot calculate "r", you have to assume that it is
equal to zero
which makes the t-test denominator = sqrt [ VarErr1 +
VarErr1]. This
denominator will be larger than the denominator if "r" is
known. The
good news is if the t-test is significant under the assumption
of r=0.00,
then it has to be significant if you can calculate r (NOTE: r
is typically
a positive value -- a negative r should cause you to
re-examine your data).
The bad news is if the t-test is non-significant, it could be
so because
there is no real difference or you failed to find a
significant difference
because you could not adjust (reduce) your denominator
appropriately.
So, treating your data as independent groups makes the test
more conservative
or less powerful. I am open to correction on these
points.
(2) It seems to me that you should be able to get an
estimate of the
Pearson r through bootstrapping or some other simulation
procedure.
If there is a positive correlation between time 1 and time 2,
then, assuming
data consisting only of 0 and 1, time1 zeros should co-occur
with time2
zeros at a greater than chance level and the same holds for
ones. even if
they are not matched up properly. I haven't thought this
through but
perhaps someone more familiar with bootstrapping with
correlation
has more wisdom.
-Mike Palij
New York University
----- Original Message -----
Sent: Friday, November 12, 2010 9:19
AM
Subject: non-SPSS: appropriate
statistical test
Colleaguees,
This is not a SPSS question (at least not yet).
I am seeking advice on the appropriate test for comparing two
non-independent samples when the non-independence cannot be modeled.
The proportions are drawn from the same employees pop (~ 700,
response rate of ~50%) employee population, surveyd one year apart. An example
of an actual comparison is 98.4% vs 96.1% between time1 and
time2.
The problem, as I see it, is the two samples are not independent but
there is no ID so neither a dependent t-test nor a mixed model can be used. I
found a test for comparing proportions from two independent groups.
What is the risk of violating the assumption of independence? inflated
type 1 error?
As far as I know there is no appropriate test for this situation, but I
thought I'd check with minds greater than mine...
Thank you,
John