Date: Tue, 5 Sep 2006 05:43:35 -0400
Reply-To: Arild S <sko@KLP.NO>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arild S <sko@KLP.NO>
Subject: Re: Comparing two datasets
On Tue, 5 Sep 2006 02:05:35 -0700, alves <alves.paulo@GMAIL.COM> wrote:
>Hi,
>
>I was given a dataset 20 variables plus a key variable that should be
>unique. When looking in detail, I notice a few duplicate records (same
>key variable) and I separate these into another dataset. Now I need to
>compare the duplicates with the ones in the original dataset to check
>if they are really duplicated or if any of the values in the other 20
>variables is different...
>
>Any easy way of doing this? It happen to me once in a dataset with 5
>variables and I just rename the variables, merge the two files and with
>arrays compared all the variables, but I am looking for a more
>efficient way.
>
>Thanks in advance
The easy way is different from what you do :-) Don't split your data.
Use proc sort, it has a nice option called "noduprecs":
data test;
input (key a b c d )($);
cards;
a b c d e f
a x x x x x
s e f g h j
a d f e f g
s d f g r h
a b c d e f
;
run;
proc sort data=test noduprecs dupout=test2;
by _all_;
run;
Duplicate records will now be found in the dupout= <dataset>.
Read the documentation, though! Also on the Sortdup= system option.