LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 1996, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 12 May 1996 19:59:58 GMT
Reply-To:     Tadd Clayton <t.clayton@AUCKLAND.AC.NZ>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Tadd Clayton <t.clayton@AUCKLAND.AC.NZ>
Organization: University of Auckland Medical School
Subject:      Summary of responses: Output obs with duplicate ID values

Hi everyone

Thanks to everyone who posted a response to my problem with outputting observations with duplicate ID variables. I should have been been able to figure it out myself - in my defence it was Friday afternoon NZ time.

The original problem was:

>I have a data set in which each observation should be uniquely >identified by a combination of two variables - a school number >and a serial number within each school. However, due to problems >with the data entry process, there are a number of observations >with duplicate values for the combination of school and serial >numbers. > >I would like to be able to compare the school and serial numbers >for each observation with those from the previous observation and >then output *both* observations to a data set if they are >duplicated. I can use a retain statement to define variables >that will carry the school and serial numbers over iterations of >the data step to allow the comparison but, as SAS appears to work >on an observation by observation basis only, I can't figure out >how to output both observations. Can anyone offer a simple >solution?

Most people suggested using the first.variable and last.variable automatic variables. A good example is this code from Jay Weedon:

Here's a simple solution that unconditionally outputs the FIRST record for each school-serial no. combination to a scratch file temp, but outputs subsequent records for that observation ONLY if all the relevant variables match (I've left the <match criterion> bit to you). The second data step simply deletes all the observations in the scratch file that have a single record associated with them:

data temp; set old; by school sn; if first.sn then output; else do; <compute match criterion for subsequent records> if <match criterion true> then output; end;

data new; set temp; by school sn; if first.sn and last.sn then delete; run;

Notes:

1. I assume the dataset to be sorted or indexed by school-serial no. 2. If there are more than 2 matching records for an observation, ALL of them will be output.

Jay Weedon.

Another approach suggested by Adam Hendricks was to use Proc SQL:

proc sql; create table dups as select school, serial, count(*) as count from data.schools group by 1,2 having count(*) > 1;

proc print; run;

Thanks again for the help.

Tadd

-- Tadd Clayton Ph: 64 9 373 7599 ext. 6451 Research Officer Fax: 64 9 373 7486 Department of Paediatrics Email: t.clayton@auckland.ac.nz School of Medicine University of Auckland Private Bag 92019 Auckland NEW ZEALAND


Back to: Top of message | Previous page | Main SAS-L page