Date: Thu, 18 Apr 1996 08:48:21 -0500
Reply-To: Walter Scott <wscott@MAIL.STATE.TN.US>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Walter Scott <wscott@MAIL.STATE.TN.US>
Subject: Q: Managing messy duplicates -Reply
You can deal with the type 1 records with:
proc sort data=temp nodup;
by var1 var2... varlast ;
run;
if you wanted to keep just one of each type 2 records you could use:
proc sort data=temp nodupkey;
by var1...;
run;
if you want to get rid of all types 2 I think you can do it by making all of the
fields that define a type 2 record part of the sort key then...
data temp2;
set temp;
if not first.varlast and last.varlast then delete;
I'm not explaining this nearly as well as the manuals do; check the SAS
Procedures Guide for proc sort and the SAS Language book pages 134-6.
I hope this helps.
Walter Scott
>>> Sung-Il Cho <sungil@HOHP.HARVARD.EDU> 23:51 Wed, 17 Apr 96
>>> wrote:
Problem:
A data set contains two different types of duplicate observations:
1) multiple carbon copies (exactly same copies)
2) multiple duplicated observations with some different information
Now I want to keep only one copy of type 1), and remove any
observations that have type 2) duplications (because some of these may
be wrong and I don't know which.)
e.g. data temp;
input ID DATE X CHAR $; /* actual data has some 100 variables */ cards;
1 10 123 a
1 10 123 a
1 10 124 a <--- duplication with different X: delete all
1 11 123 b
1 11 123 c <--- duplication with different CHAR: delete all
1 11 123 c
1 11 123 b
1 12 124 d
1 12 124 d
1 12 124 d <--- three carbon copies: keep one
1 13 125 e <--- singleton obs: keep
1 14 125 e
2 12 125 f
2 12 125 f
2 13 126 g
....
;
The final data I want to get is:
1 12 124 d
1 13 125 e
1 14 125 e
2 12 125 f
...
Now my questions:
1) Can this be done in one data step ? (I gave up.)
2) Would somebody care to show a decent IML coding for this ?
(mine ended up so ugly that I don't want to use it, either.)
Thank you very much.
-----------------------------------------------------------
Sung-Il Cho 617-432-1147 (voice) 432-0219 (FAX)
Dept. of Env Health (Occupational Health Program) &
Dept. of Epidemiology, Harvard School of Public Health
665 Huntington Av., Boston MA 02115