Date: Wed, 9 Feb 2000 20:21:36 -0300
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Hector E. Maletta" <hmaletta@OVERNET.COM.AR>
Subject: Re: Duplicate records
Content-Type: text/plain; charset=us-ascii
I think the solution suggested by Nassar Naji to delete duplicate cases
is essentially valid but needs some clarification. See below my
NASSAR NAJI wrote:
> I hope this syntax helps:
> SORT CASES BY id (A).
> compute duplicat=1-(id=lag(id)).
> duplicate takes 0 value for the duplicate records, so it can be used as
> In such case, you will keep only the first case for several record..
With this syntax, the second (and subsequent) cases in each set of
duplicate cases take the value 0, while the rest take the value 1.
To keep only the first case in each set of duplicates, and also the
non-duplicate cases, you need yet another command:
SELECT IF (DUPLICAT=1).
The resulting file should be saved under another name, to preserve the
original file (just in case).
One final word of caution: having the same ID is not equivalent to being
duplicate in all variables. One should be sure that cases with different
information pertaining to the same ID are not present in your file.
Otherwise you might be losing valuable information.
Another way to achieve the same result is through AGGREGATE:
sort cases by ID.
/var_1 to var_n = first(var_1 to var_n).
Universidad del Salvador
Buenos Aires, Argentina