Date: Fri, 14 Jul 2006 11:52:59 -0400
Reply-To: "Meyer, Gregory J" <gmeyer@UTNet.UToledo.Edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Meyer, Gregory J" <gmeyer@UTNet.UToledo.Edu>
Subject: Re: interrater agreement, intrarater agreement
Content-Type: text/plain; charset="us-ascii"
Building on Gene's suggestions, I would recommend you compute an
intraclass correlation using a oneway random effects design. SPSS uses
three different kinds of models (and for two way models gives the option
of computing a consistency ICC or an exact agreement ICC), each of which
have slightly different definitions of error. The oneway model is the
one that would be appropriate for you to assess interrater reliability.
An update of the classic Fleiss and Shrout (1978) article is this paper:
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some
intraclass correlation coefficients. Psychological Methods, 1, 30-46.
For the data you provided, the default syntax would be as given below,
though you can modify the confidence interval and null value. In the
output you would typically focus on the "Single Measures" reliability
rather than the Spearman-Brown estimate of "Average Measures"
/SCALE ('ALL VARIABLES') ALL/MODEL=ALPHA
/ICC=MODEL(ONEWAY) CIN=95 TESTVAL=0 .
I am not certain how you would examine intrarater reliability with these
data. For this analysis the scorers would have to rate the same essay at
least twice. It doesn't look like this was done. However, if it was, you
would continue to have participants in the rows and a column for
IDsubject. In addition, you would want a column for IDrater, rateT1, and
rateT2, with T designating the time of the rating. For this design, you
could run a two way random effects design because the 2nd ratings are
always differentiated from the first. However, using the same model you
could also split the file by IDrater and generate reliability values for
each scorer. In order to obtain findings that parallel the findings from
the interrater analyses you would want to use the Absolute Agreement
coefficient rather than the Consistency coefficient. McGraw and Wong
discuss all these models and types.
| -----Original Message-----
| From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]
| On Behalf Of Gene Maguin
| Sent: Wednesday, July 12, 2006 1:28 PM
| To: SPSSX-L@LISTSERV.UGA.EDU
| Subject: Re: interrater agreement, intrarater agreement
| I haven't see any other replies. Yes, I think your data is
| set up correctly.
| As shown you have it arranged in a multivariate ('wide')
| setup. From there
| you can do a repeated measures anova or use reliability.
| However, I think
| there are a number of different formulas to use depending on
| whether you
| have the same raters rating everybody, or, as you have, two raters are
| randomly selected to rate each person. I'd bet anything the
| formulas are different and I'll bet almost anything that spss can't
| accommodate both. There's a literature on rater agreement and
| on intraclass
| correlation. If you haven't looked at that, you should.
| However, I can't
| help you on that. One thing you might do is google
| 'intraclass correlation'
| and there's a citation in the top 20 or 30 that references a
| book by, I
| think, Fleiss (or Fliess) and Shrout. Another term to google
| is 'kappa'
| (which is available from spss crosstabs).
| I'm hoping that you have other responses that are more
| helpful than I am
| able to be.
| Gene Maguin
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Sent: Wednesday, July 12, 2006 7:16 AM
Subject: interrater agreement, intrarater agreement
I have a dataset of 10000 subjects and their scores on a composition
wrote. Each composition was scored by 2 different raters (randomly
from a pool of 70). The scores could range from 0 to 15.
So far I have set up a table with 10000 rows/cases and 5 columns
IDraterA, rateA, IDraterB, rateB)
00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a
00002, 1200, 09, 1300, 12
00003, 1400, 15, 1200, 13
00004, 1400, 02, 1200, 08 etc.
Can someone suggest the best possible layout and analysis to investigate
inter and intra rater agreement?