LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2004)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 16 Oct 2004 16:40:10 -0400
Reply-To:     Martin Sherman <msherman@loyola.edu>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Martin Sherman <msherman@loyola.edu>
Subject:      Re: Using Identify duplicate cases
Comments: To: hmaletta@fibertel.com.ar
Content-Type: text/plain; charset=US-ASCII

Here is the output when using 65 variables. GET FILE='H:\MSHERMAN\SPSSscripts\dd1.sav'. ADD FILES /FILE=* /FILE='H:\MSHERMAN\SPSSscripts\dd2.sav' /IN=source01. VARIABLE LABELS source01 'Case source is H:\MSHERMAN\SPSSscripts\dd2.sav'. EXECUTE. * Identify Duplicate Cases. SORT CASES BY i1(A) i2(A) i3(A) i4(A) i5(A) i6a(A) i6b(A) i6c(A) i6d(A) i6e(A) i 6f(A) i6g(A) i6h(A) i7(A) i8(A) i9(A) i10(A) i10a(A) i11(A) i12(A) i13(A) i14a(A) i14b(A) i14c(A) i14d(A) i15(A) i16(A) ii1 (A) ii2(A) ii3(A) ii4(A) ii5(A) ii6(A) ii7(A) ii8(A) ii9(A) ii10(A) ii11(A) ii12(A) ii13(A) ii14(A) ii15(A) ii16(A) ii17(A) ii18(A) ii19(A) ii20(A) ii21(A) ii22(A) iii1(A ) iii2(A) iii3(A) iii4(A) iii5(A) iii6(A) iii7(A) iii8(A) iii9(A) iii10(A) iii1 1(A) iii12(A) iii13(A) iii14(A) iii15(A) id(A) .

>Error # 5840. Command name: SORT CASES BY >Attempt to sort a file on more than 64 keys. >This command not executed.

MATCH FILES /FILE = * /BY i1 i2 i3 i4 i5 i6a i6b i6c i6d i6e i6f i6g i6h i7 i8 i 9 i10 i10a i11 i12 i13 i14a i14b i14c i14d i15 i16 ii1 ii2 ii3 ii4 ii5 ii6 ii7 ii8 ii9 ii10 ii11 ii12 ii13 ii14 ii15 ii16 ii17 ii18 ii19 ii20 ii21 ii22 iii1 iii2 iii3 iii4 iii5 iii6 iii7 iii8 iii9 iii10 iii11 iii12 iii13 iii14 iii15 /FIRST = PrimaryFirst /LAST = PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence = 1 - PrimaryLast. ELSE. COMPUTE MatchSequence = MatchSequence + 1. END IF. LEAVE MatchSequence. FORMAT MatchSequence (f7). COMPUTE InDupGrp = MatchSequence > 0. SORT CASES InDupGrp(D). File #1 KEY: 122462 39 3 1 2 0 ' +' 0 1 1 0 0 ' +' 0 0 4 14 1 ' +' 5 MIA 7 1 2 ' +' 0 0 1 0 2 ' +' 3 0 0 0 0 ' +' 0 0 0 0 0 ' +' 0 0 0 0 0 ' +' 0 0 0 0 0 ' +' 0 0 0 2 1 ' +' 1 1 1 1 1 ' +' 1 1 1 1 2 ' +' 1 2 2

>Error # 5130 >File out of order. All the files in MATCH FILES must be in non-descending >order on the BY variables. Use SORT CASES to sort the file. >This command not executed.

Any changes made to the working file since 16-OCT-2004 16:37:46 have been lost. The time now is 16:38:09.

MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp.

>Error # 5241 in column 31. Text: PrimaryFirst >Undefined variable name. Check spelling, verify the existence of this >variable. Has it been dropped or renamed in this command? >This command not executed.

>Error # 5241 in column 44. Text: InDupGrp >Undefined variable name. Check spelling, verify the existence of this >variable. Has it been dropped or renamed in this command?

>Note # 5145 >The working file has been restored, and subsequent commands may access the >working file.

VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'

>Warning # 4461 in column 17. Text: PrimaryLast >An unknown variable name was specified on the VAR LABELS command. The name >and the label will be ignored.

MatchSequence 'Sequential count of matching cases' .

>Warning # 4461 in column 3. Text: MatchSequence >An unknown variable name was specified on the VAR LABELS command. The name >and the label will be ignored.

VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.

>Warning # 4474. Command name: VALUE LABELS >The (ADD) VALUE LABELS command specifies an unknown variable name. The >name will be ignored.

>The error is associated with 'PrimaryLast'

VARIABLE LEVEL PrimaryLast (ORDINAL) /MatchSequence (SCALE).

>Error # 701 in column 16. Text: PrimaryLast >An undefined variable name, or a scratch or system variable was specified >in a variable list which accepts only standard variables. Check spelling >and verify the existence of this variable. >This command not executed.

>Error # 701 in column 7. Text: MatchSequence >An undefined variable name, or a scratch or system variable was specified >in a variable list which accepts only standard variables. Check spelling >and verify the existence of this variable.

FREQUENCIES VARIABLES = PrimaryLast MatchSequence .

>>> Hector Maletta <hmaletta@fibertel.com.ar> 10/16/04 04:21PM >>> I never ran into this problem, but the message implies that SPSS sorts the case by all the variables in the file, and looks for an exact duplication of all variables, with a limit of 64 variables. This is odd, because searching for duplicate cases on all variables would easily involve more than 64 variables. I mean, if SPSS implemented this facility to identify duplicates but the procedure involved SORT CASES, and SORT CASES has a limit of 64 keys, the Identify Duplicates facility would be seriously flawed. Anybody knows anything about this? I have not checked it yet (I am currently abroad and do not have a recent version of SPSS with me).

Hector

> -----Original Message----- > From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] > On Behalf Of Martin Sherman > Sent: Saturday, October 16, 2004 4:15 PM > To: SPSSX-L@LISTSERV.UGA.EDU > Subject: Using Identify duplicate cases > > > Dear List: I am running identify dublicate cases with a large > data file and get the following error message Error # 5840 > Comand name: Sort cases by > attempt to sort a file on more than 64 keys > This command not executed. > > Has anyone run across this problem before? Is there a limit to the > number of variables you can check for duplication? thanks, martin > sherman >


Back to: Top of message | Previous page | Main SPSSX-L page