**Date:** Sat, 27 Dec 1997 18:52:41 GMT
**Reply-To:** Richard F Ulrich <wpilib+@PITT.EDU>
**Sender:** "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
**From:** Richard F Ulrich <wpilib+@PITT.EDU>
**Organization:** University of Pittsburgh
**Subject:** Re: Test-retest Reliability: intraclass coefficient
<< d254324@nobel.si.uqam.ca : randolph@mlink.net >>
Randolph Stephenson (d254324@er.uqam.ca) wrote:

: Regarding the test-retest reliability, my problem is similar to Mary
: Tullu' posting. However, I learned that I could do an intraclass
: corelation to evaluate the test-retest reliability. I read that the rho
: coefficient is an intraclass reliability. I know that David Nichols put
: an intraclass reliability macro available on the SPSS site: I got it. I
: have SPSS 7.5.1 for PC. I have a hard time to implement his macro.

: my data is very similar to the folowing:

: Participants Time Q1 Q2 Q3 to Q17
: 001 1 4 5 4 4
: 002 1 3 4 3 3
: 001 2 4 4 5 3
: etc.

: A repeated measure ANOVA seems in order but how to implement it. I think
: that the rho to use is the mixed-model since Q1 to Q17 are fixed and the
: Participants are random.
<< snip, the rest >>

- Convention says that we use slightly different terminology for
the two cases that you are dealing with here, even though the
math overlaps greatly.

"Internal reliability" of a scale
is often measured by Cronbach's coefficient alpha. It is relevent
when you will compute a total score and you want to know its
reliability, based on no other rating. The "reliability" is
*estimated* from the average correlation, and from the number of
items, since a longer scale will (presumably) be more reliable.
Whether the items have the same means is not usually important

For "inter-rater" reliability, one distinction is that the
importance lies with the reliability of the single rating.

For examining your own data, I think you cannot do better than
looking at the paired t-test and Pearson correlations between
each pair of raters - the t-test tells you whether the means
are different, while the correlation tells you whether the
judgments are otherwise consistent.

Unlike the Pearson, the "intra-class" correlation assumes that
the raters do have the same mean. It is not bad as an overall
summary, and it is precisely what some editors do want to see
presented for reliability across raters. It is both a plus
and a minus, that there are *several* different formulas for
intraclass correlation, depending on whose reliability is
being estimated -

For purposes such as planning the Power
for a proposed study, it does matter whether the raters to
be used will be exactly the same individuals.

Hope this helps.

Rich Ulrich, biostatistician wpilib+@pitt.edu
http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh