Date: Fri, 14 May 2004 17:52:34 +0200
Reply-To: Ace <b.rogers@VIRGIN.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ace <b.rogers@VIRGIN.NET>
Subject: Re: proc sort
Content-Type: text/plain; charset=ISO-8859-1
On 14 May 04 15:19:27 GMT, robert.walls@EUROPE.PPDI.COM (Robert Walls)
wrote:
>helen wrote:
>> I have a dataset contained some duplicate data. I'd like to delete
>> those observations. Normally I use ‘proc sort ; by listing vars'
>> statement to do it. In my case, there are around 60 variables for one
>> observation, I'd like to compare 59 variables to see if it is
>> duplicate, instead of list all variables, is there any easy way to do
>> it?
>Have you tried the nodupkey option on your proc sort. E.g.:
>
> proc sort data = work.X1 out = work.X2 nodupkey;
> by listing vars;
> run;
>This will remove all of your duplicate obs for the by variables of the
>proc sort from the data set, and I think that the option nodup will just
>remove all repeat obs for all of the variables in a data set (but don't
>quote me on that one!).
You are correct. However, it's not clear if either is exactly what's
required here - there seems to be only one variable that's excluded
from the 'key', so the OP may be wanting a way to avoid having to
hard-code the 59 others into a BY statement.
This could be done by using a macro, or previous datastep, top
generate the list of variables, but an alternative occurred to me:
proc sort data=old (drop=var60) out=new1 nodupkey;
by _numeric_ _character_ ;
run;
data new2;
merge
>
>
--
Ace in Basel - brucedotrogers a.t rochedotcom