Date: Sat, 26 Feb 2000 02:28:49 GMT
Reply-To: Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Organization: @Home Network
Subject: Re: How do you check for duplicatess?
Ok, there've been a bunch of replies pointing out the missing BY statement.
But you don't necessarily need to have a LONGDATA variable which is a
concatenation of all the variables. You could code look something like
proc sort data = inld7.all;
if first.lastvar then output inld7.nodups;
else output inld8.dups;
SORTing and SETing by _ALL_ save you the effort of coding a LONGVAR, and the
space it takes. Additionally, if the total length of all your concatenated
variables is exceeds the maximum length of a character variable, you need
more than one, while using _all_ simply uses all the variables.
A few words about LASTVAR in the data step. Variables are in a data set in
the order they are established - the first variable set up is variable 1,
the second is variable 2, etc. You can usually figure out which variable is
the last one on an observation by looking at the code, if you have it
(sometimes you get data sets from someone/somewhere else, and don't have the
code). You can also open a VAR window on the data set, and the variables
are listed in their order of appearance. Or you can run a PROC CONTENTS on
the data set and see the order there. In any case, you'd use the name of
the last variable to appear on an observation, not "lastvar".
Maurice Muoneke wrote in message <firstname.lastname@example.org>...
>I am trying to use the code below to detect duplicate records. The field
>'longdata' is a concatenation of all the variables. I know there are
>duplicates, but when I run it, I get all the records in the duplicates file
>and no records in the other file. What is wrong?
>proc sort data = inld7.all; by longdata; run;
>data INLD7.dups INLD7.nodups ; SET INLD7.ALL;
> if first.longdata then
> output INLD7.nodups;
> output INLD7.dups;