Date: Sat, 16 Oct 2004 16:40:10 -0400
Reply-To: Martin Sherman <msherman@loyola.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Martin Sherman <msherman@loyola.edu>
Subject: Re: Using Identify duplicate cases
Content-Type: text/plain; charset=US-ASCII
Here is the output when using 65 variables.
GET
FILE='H:\MSHERMAN\SPSSscripts\dd1.sav'.
ADD FILES /FILE=*
/FILE='H:\MSHERMAN\SPSSscripts\dd2.sav'
/IN=source01.
VARIABLE LABELS source01
'Case source is H:\MSHERMAN\SPSSscripts\dd2.sav'.
EXECUTE.
* Identify Duplicate Cases.
SORT CASES BY i1(A) i2(A) i3(A) i4(A) i5(A) i6a(A) i6b(A) i6c(A) i6d(A)
i6e(A) i
6f(A) i6g(A) i6h(A) i7(A) i8(A) i9(A) i10(A)
i10a(A) i11(A) i12(A) i13(A) i14a(A) i14b(A) i14c(A) i14d(A) i15(A)
i16(A) ii1
(A) ii2(A) ii3(A) ii4(A) ii5(A) ii6(A) ii7(A)
ii8(A) ii9(A) ii10(A) ii11(A) ii12(A) ii13(A) ii14(A) ii15(A) ii16(A)
ii17(A)
ii18(A) ii19(A) ii20(A) ii21(A) ii22(A) iii1(A
) iii2(A) iii3(A) iii4(A) iii5(A) iii6(A) iii7(A) iii8(A) iii9(A)
iii10(A) iii1
1(A) iii12(A) iii13(A) iii14(A) iii15(A) id(A)
.
>Error # 5840. Command name: SORT CASES BY
>Attempt to sort a file on more than 64 keys.
>This command not executed.
MATCH FILES /FILE = * /BY i1 i2 i3 i4 i5 i6a i6b i6c i6d i6e i6f i6g
i6h i7 i8 i
9 i10 i10a i11 i12 i13 i14a i14b i14c i14d
i15 i16 ii1 ii2 ii3 ii4 ii5 ii6 ii7 ii8 ii9 ii10 ii11 ii12 ii13 ii14
ii15 ii16
ii17 ii18 ii19 ii20 ii21 ii22 iii1 iii2 iii3
iii4 iii5 iii6 iii7 iii8 iii9 iii10 iii11 iii12 iii13 iii14 iii15
/FIRST = PrimaryFirst /LAST = PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence = 1 - PrimaryLast.
ELSE.
COMPUTE MatchSequence = MatchSequence + 1.
END IF.
LEAVE MatchSequence.
FORMAT MatchSequence (f7).
COMPUTE InDupGrp = MatchSequence > 0.
SORT CASES InDupGrp(D).
File #1
KEY: 122462 39 3 1 2 0
'
+' 0 1 1 0
0 '
+' 0 0 4 14
1 '
+' 5 MIA 7 1
2 '
+' 0 0 1 0
2 '
+' 3 0 0 0
0 '
+' 0 0 0 0
0 '
+' 0 0 0 0
0 '
+' 0 0 0 0
0 '
+' 0 0 0 2
1 '
+' 1 1 1 1
1 '
+' 1 1 1 1
2 '
+' 1 2 2
>Error # 5130
>File out of order. All the files in MATCH FILES must be in
non-descending
>order on the BY variables. Use SORT CASES to sort the file.
>This command not executed.
Any changes made to the working file since 16-OCT-2004 16:37:46 have
been lost.
The time now is 16:38:09.
MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp.
>Error # 5241 in column 31. Text: PrimaryFirst
>Undefined variable name. Check spelling, verify the existence of
this
>variable. Has it been dropped or renamed in this command?
>This command not executed.
>Error # 5241 in column 44. Text: InDupGrp
>Undefined variable name. Check spelling, verify the existence of
this
>variable. Has it been dropped or renamed in this command?
>Note # 5145
>The working file has been restored, and subsequent commands may access
the
>working file.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as
Primary'
>Warning # 4461 in column 17. Text: PrimaryLast
>An unknown variable name was specified on the VAR LABELS command. The
name
>and the label will be ignored.
MatchSequence 'Sequential count of matching cases' .
>Warning # 4461 in column 3. Text: MatchSequence
>An unknown variable name was specified on the VAR LABELS command. The
name
>and the label will be ignored.
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
>Warning # 4474. Command name: VALUE LABELS
>The (ADD) VALUE LABELS command specifies an unknown variable name.
The
>name will be ignored.
>The error is associated with 'PrimaryLast'
VARIABLE LEVEL PrimaryLast (ORDINAL)
/MatchSequence (SCALE).
>Error # 701 in column 16. Text: PrimaryLast
>An undefined variable name, or a scratch or system variable was
specified
>in a variable list which accepts only standard variables. Check
spelling
>and verify the existence of this variable.
>This command not executed.
>Error # 701 in column 7. Text: MatchSequence
>An undefined variable name, or a scratch or system variable was
specified
>in a variable list which accepts only standard variables. Check
spelling
>and verify the existence of this variable.
FREQUENCIES VARIABLES = PrimaryLast MatchSequence .
>>> Hector Maletta <hmaletta@fibertel.com.ar> 10/16/04 04:21PM >>>
I never ran into this problem, but the message implies that SPSS sorts
the
case by all the variables in the file, and looks for an exact
duplication of
all variables, with a limit of 64 variables. This is odd, because
searching
for duplicate cases on all variables would easily involve more than 64
variables. I mean, if SPSS implemented this facility to identify
duplicates
but the procedure involved SORT CASES, and SORT CASES has a limit of
64
keys, the Identify Duplicates facility would be seriously flawed.
Anybody
knows anything about this? I have not checked it yet (I am currently
abroad
and do not have a recent version of SPSS with me).
Hector
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]
> On Behalf Of Martin Sherman
> Sent: Saturday, October 16, 2004 4:15 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Using Identify duplicate cases
>
>
> Dear List: I am running identify dublicate cases with a large
> data file and get the following error message Error # 5840
> Comand name: Sort cases by
> attempt to sort a file on more than 64 keys
> This command not executed.
>
> Has anyone run across this problem before? Is there a limit to the
> number of variables you can check for duplication? thanks,
martin
> sherman
>