=========================================================================
Date: Fri, 14 Jul 2006 11:52:59 0400
ReplyTo: "Meyer, Gregory J" <gmeyer@UTNet.UToledo.Edu>
Sender: "SPSSX(r) Discussion" <SPSSXL@LISTSERV.UGA.EDU>
From: "Meyer, Gregory J" <gmeyer@UTNet.UToledo.Edu>
Subject: Re: interrater agreement, intrarater agreement
InReplyTo: A<001001c6a5e0$f15bf230$2345cd80@ssw.buffalo.edu>
ContentType: text/plain; charset="usascii"
Vassilis,
Building on Gene's suggestions, I would recommend you compute an
intraclass correlation using a oneway random effects design. SPSS uses
three different kinds of models (and for two way models gives the option
of computing a consistency ICC or an exact agreement ICC), each of which
have slightly different definitions of error. The oneway model is the
one that would be appropriate for you to assess interrater reliability.
An update of the classic Fleiss and Shrout (1978) article is this paper:
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some
intraclass correlation coefficients. Psychological Methods, 1, 3046.
For the data you provided, the default syntax would be as given below,
though you can modify the confidence interval and null value. In the
output you would typically focus on the "Single Measures" reliability
rather than the SpearmanBrown estimate of "Average Measures"
reliability.
RELIABILITY
/VARIABLES=rateA rateB
/SCALE ('ALL VARIABLES') ALL/MODEL=ALPHA
/ICC=MODEL(ONEWAY) CIN=95 TESTVAL=0 .
I am not certain how you would examine intrarater reliability with these
data. For this analysis the scorers would have to rate the same essay at
least twice. It doesn't look like this was done. However, if it was, you
would continue to have participants in the rows and a column for
IDsubject. In addition, you would want a column for IDrater, rateT1, and
rateT2, with T designating the time of the rating. For this design, you
could run a two way random effects design because the 2nd ratings are
always differentiated from the first. However, using the same model you
could also split the file by IDrater and generate reliability values for
each scorer. In order to obtain findings that parallel the findings from
the interrater analyses you would want to use the Absolute Agreement
coefficient rather than the Consistency coefficient. McGraw and Wong
discuss all these models and types.
Good luck,
Greg
 Original Message
 From: SPSSX(r) Discussion [mailto:SPSSXL@LISTSERV.UGA.EDU]
 On Behalf Of Gene Maguin
 Sent: Wednesday, July 12, 2006 1:28 PM
 To: SPSSXL@LISTSERV.UGA.EDU
 Subject: Re: interrater agreement, intrarater agreement

 Vassilis,

 I haven't see any other replies. Yes, I think your data is
 set up correctly.
 As shown you have it arranged in a multivariate ('wide')
 setup. From there
 you can do a repeated measures anova or use reliability.
 However, I think
 there are a number of different formulas to use depending on
 whether you
 have the same raters rating everybody, or, as you have, two raters are
 randomly selected to rate each person. I'd bet anything the
 computational
 formulas are different and I'll bet almost anything that spss can't
 accommodate both. There's a literature on rater agreement and
 on intraclass
 correlation. If you haven't looked at that, you should.
 However, I can't
 help you on that. One thing you might do is google
 'intraclass correlation'
 and there's a citation in the top 20 or 30 that references a
 book by, I
 think, Fleiss (or Fliess) and Shrout. Another term to google
 is 'kappa'
 (which is available from spss crosstabs).

 I'm hoping that you have other responses that are more
 helpful than I am
 able to be.

 Gene Maguin

Original Message
From: SPSSX(r) Discussion [mailto:SPSSXL@LISTSERV.UGA.EDU] On Behalf Of
Vassilis Hartzoulakis
Sent: Wednesday, July 12, 2006 7:16 AM
To: SPSSXL@LISTSERV.UGA.EDU
Subject: interrater agreement, intrarater agreement
Hi everyone
I have a dataset of 10000 subjects and their scores on a composition
they
wrote. Each composition was scored by 2 different raters (randomly
selected
from a pool of 70). The scores could range from 0 to 15.
So far I have set up a table with 10000 rows/cases and 5 columns
(IDsubject,
IDraterA, rateA, IDraterB, rateB)
e.g.
00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a
14/15)
00002, 1200, 09, 1300, 12
00003, 1400, 15, 1200, 13
00004, 1400, 02, 1200, 08 etc.
Can someone suggest the best possible layout and analysis to investigate
inter and intra rater agreement?
Thank you
