Date: Tue, 15 Jun 2010 10:40:54 -0500
Reply-To: Robin R High <rhigh@UNMC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Robin R High <rhigh@UNMC.EDU>
Subject: Re: Proper reliability ICC model
In-Reply-To: <201006142154.o5EKcOIp005601@malibu.cc.uga.edu>
Content-Type: text/plain; charset="US-ASCII"
Simcha,
You can infer what ICC(2,1) and ICC(3,1) imply from the PROC MIXED code
that produces them:
ICC(2,1): when all subjects are rated by the same raters who are assumed
to be a random subset of all possible raters
* ICC(2,1) Shrout-Fleiss reliability: random set;
PROC MIXED DATA=tst;
CLASS rater subject;
MODEL rating = ;
RANDOM rater subject / v vcorr ; *both rater and subject are random;
title 'ICC(2,1) Shrout-Fleiss Reliability: Random set';
Run;
ICC(3,1): when all subjects are rated by the same raters who are assumed
to be the entire population of raters
* ICC(3,1) Shrout-Fleiss reliability: fixed set;
PROC MIXED DATA=tst;
CLASS rater subject;
MODEL rating = rater; * rater treated as fixed effect;
RANDOM subject / v vcorr ; * subject is random;
title 'ICC(3,1) Shrout-Fleiss Reliability: Fixed set';
Run;
Robin High
UNMC
From:
Simcha Pollack <simcha.pollack@CHSLI.ORG>
To:
SAS-L@LISTSERV.UGA.EDU
Date:
06/14/2010 04:56 PM
Subject:
Proper reliability ICC model
Sent by:
"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
It is quite common, especially in medical statistics, to compare two
METHODS
to each other and try to determine if they are 1) Consistent and 2)
Interchangeable.
Intraclass correlation (ICC) is often used for this. ICC is implemented in
a
SAS macro that produces six versions of the ICC depending, among other
things, on whether the 'judges' are a random or fixed effect. This macro,
in
turn, is based on a classic paper by Shrout and Fleiss (1979). They state
"If, for example, two judges are used to rate the same n targets, the
CONSISTENCY of the two ratings is measured by ICC(3,1), treating the
judges
as fixed effects. To measure the AGREEMENT of these judges, ICC(2,1) is
used; in this instance the question being asked is whether the judges are
interchangeable."
When two methods are being compared to see if they are in agreement, they
seem to me to comprise the whole universe of possible methods and
therefore
should be treated as fixed effects. Yet Shrout and Fleiss seem to imply
that these are random effects.
Could someone please explain if ICC(2,1) or ICC(3,1) should be used when
trying to decide if two methods (both applied to a sample of n targets)
are
interchangeable? Thanks.