LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1996, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 18 Apr 1996 08:48:21 -0500
Reply-To:     Walter Scott <wscott@MAIL.STATE.TN.US>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Walter Scott <wscott@MAIL.STATE.TN.US>
Subject:      Q: Managing messy duplicates -Reply
Comments: To: sungil@HOHP.HARVARD.EDU

You can deal with the type 1 records with: proc sort data=temp nodup; by var1 var2... varlast ; run; if you wanted to keep just one of each type 2 records you could use: proc sort data=temp nodupkey; by var1...; run;

if you want to get rid of all types 2 I think you can do it by making all of the fields that define a type 2 record part of the sort key then... data temp2; set temp; if not first.varlast and last.varlast then delete;

I'm not explaining this nearly as well as the manuals do; check the SAS Procedures Guide for proc sort and the SAS Language book pages 134-6.

I hope this helps.

Walter Scott

>>> Sung-Il Cho <sungil@HOHP.HARVARD.EDU> 23:51 Wed, 17 Apr 96 >>> wrote: Problem:

A data set contains two different types of duplicate observations: 1) multiple carbon copies (exactly same copies) 2) multiple duplicated observations with some different information

Now I want to keep only one copy of type 1), and remove any observations that have type 2) duplications (because some of these may be wrong and I don't know which.)

e.g. data temp; input ID DATE X CHAR $; /* actual data has some 100 variables */ cards; 1 10 123 a 1 10 123 a 1 10 124 a <--- duplication with different X: delete all 1 11 123 b 1 11 123 c <--- duplication with different CHAR: delete all 1 11 123 c 1 11 123 b 1 12 124 d 1 12 124 d 1 12 124 d <--- three carbon copies: keep one 1 13 125 e <--- singleton obs: keep 1 14 125 e 2 12 125 f 2 12 125 f 2 13 126 g .... ;

The final data I want to get is: 1 12 124 d 1 13 125 e 1 14 125 e 2 12 125 f ...

Now my questions:

1) Can this be done in one data step ? (I gave up.)

2) Would somebody care to show a decent IML coding for this ? (mine ended up so ugly that I don't want to use it, either.)

Thank you very much. ----------------------------------------------------------- Sung-Il Cho 617-432-1147 (voice) 432-0219 (FAX) Dept. of Env Health (Occupational Health Program) & Dept. of Epidemiology, Harvard School of Public Health 665 Huntington Av., Boston MA 02115


Back to: Top of message | Previous page | Main SAS-L page