LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2000)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 9 Feb 2000 20:21:36 -0300
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Hector E. Maletta" <hmaletta@OVERNET.COM.AR>
Subject:      Re: Duplicate records
Comments: To: NASSAR NAJI <nassar@CYBERCABLE.FR>
Content-Type: text/plain; charset=us-ascii

I think the solution suggested by Nassar Naji to delete duplicate cases is essentially valid but needs some clarification. See below my comments.

NASSAR NAJI wrote: > I hope this syntax helps: > SORT CASES BY id (A). > compute duplicat=1-(id=lag(id)). > execute. > duplicate takes 0 value for the duplicate records, so it can be used as > filter. > In such case, you will keep only the first case for several record..

With this syntax, the second (and subsequent) cases in each set of duplicate cases take the value 0, while the rest take the value 1. To keep only the first case in each set of duplicates, and also the non-duplicate cases, you need yet another command: SELECT IF (DUPLICAT=1). The resulting file should be saved under another name, to preserve the original file (just in case). One final word of caution: having the same ID is not equivalent to being duplicate in all variables. One should be sure that cases with different information pertaining to the same ID are not present in your file. Otherwise you might be losing valuable information.

Another way to achieve the same result is through AGGREGATE:

sort cases by ID. AGGREGATE OUTFILE=*/presorted/break=ID /var_1 to var_n = first(var_1 to var_n).

Hector Maletta Universidad del Salvador Buenos Aires, Argentina

Back to: Top of message | Previous page | Main SPSSX-L page