LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2003, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 8 May 2003 17:11:14 +0000
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: ****HOW TO Count Duplicate Records
Comments: To: cybie@hotmail.com
Content-Type: text/plain; format=flowed

Cybie,

Assuming that the file is grouped by BRANCH DATE, the task fits in the DoW-loop like a glove:

data report ; do group_dupes = 0 by 1 until ( last.date ) ; set a ; by branch date notsorted ; end ; total_dupes ++ group_dupes ; run ;

In the case with you sample data, it should result in the output like this:

group_ total_ acct_num branch date dupes dupes

100 PH 20030303 2 2 100 PH 20030304 2 4 200 MN 20030303 3 7

HTH.

Kind regards, ----------------------- Paul M. Dorfman Jacksonville, FL -----------------------

>From: Cybie Frontier <cybie@HOTMAIL.COM> > >Hi Folks: > >I have a fairly large SAS dataset and when I checked for duplicate records >(all fields have the same value) I did proc sort nodupkey. My key was all >the columns. I came up with 300K duplicate records. There are 24 variables >in the dataset.One of the key variables is account_number. I want to >achieve two things: > >Assign a dupe counter to each record. In other words, if an >account_number occurs 7 times with the same record, I want to have >a dupecounter variable that will be 1 for the first record for that >account_number, 2 for the second record for that account_number, where all >variables are identical. > >Create a separate data sets with the duplicate records (not just >duplicate account numbers). These are true duplicates, i.e. values in all >fields are same. > >Example: > >acct_num branch date dupecounter >100 PH 20030303 1 >100 PH 20030303 2 >100 PH 20030303 3 >100 PH 20030304 1 >100 PH 20030304 2 >100 PH 20030304 3 >100 MN 20030303 1 >200 MN 20030303 2 >200 MN 20030303 3 >200 MN 20030303 4 > >etc....... > >Now, I want to produce a report that will tell me the number of total >duplicates. In this case it will be 7. The dupecounter where the value of >the field is 1 is not to be counted as a duplicate.

_________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail


Back to: Top of message | Previous page | Main SAS-L page