LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 1997, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 26 Feb 1997 08:50:12 CST
Reply-To:     Undetermined origin c/o LISTSERV maintainer
              <owner-LISTSERV@AKH-WIEN.AC.AT>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
Comments:     RFC822 error: <E> "From:"/"Sender:" field is missing.
From:         Undetermined origin c/o LISTSERV maintainer
              <owner-LISTSERV@AKH-WIEN.AC.AT>
Subject:      Re: Losing Duplicates Inconsistently (PROC SORT)
Comments: To: Dan Keating <dtkeats@ibm.net>

This is about the most annoying "feature" of SAS (other than not being able to output the duplicates found in a proc sort to another file). The way you have to deal with this is to sort the database using EVERY VARIABLE as the key. You can use the NODUP or NODUPKEY options when you do this, and you'll lose all of your duplicates...

But hey, at least you're not using Foxpro or something like that...

Bruce Johnson bjohnson@sachs.com

______________________________ Reply Separator _________________________________ Subject: Losing Duplicates Inconsistently (PROC SORT) Author: Dan Keating <dtkeats@ibm.net> at Internet Date: 2/25/97 5:35 PM

I am combining two datasets and then running proc sort with "nodup" to eliminate duplicates. When I sort by one variable, I lose more than a third of the records. When I sort by a different variable, though, I don't lose any. If I were using "nodupkey" this would make sense. But I'm not. (Running v6.12 on OS/2 v4.0.) Here's the code and responses from the output: ******************************************* version 1 -- duplicates are eliminated: ******************************************* 649 data offmastr (keep=copid offname offnum agency); 650 set witness.officer witness.officerj; 651 run;

NOTE: The data set WORK.OFFMASTR has 12581 observations and 4 variables. NOTE: The DATA statement used 1.56 seconds.

652 653 proc sort data=offmastr nodup; 654 by copid; 655 run;

NOTE: 4828 duplicate observations were deleted. NOTE: The data set WORK.OFFMASTR has 7753 observations and 4 variables. NOTE: The PROCEDURE SORT used 1.0 seconds.

*********************************** version 2 -- no duplicates eliminated: ************************************ 619 data offmastr (keep=copid offname offnum agency); 620 set witness.officer witness.officerj; 621 run;

NOTE: The data set WORK.OFFMASTR has 12581 observations and 4 variables. NOTE: The DATA statement used 1.56 seconds.

622 623 proc sort data=offmastr nodup; 624 by agency; 625 run;

NOTE: 0 duplicate observations were deleted. NOTE: The data set WORK.OFFMASTR has 12581 observations and 4 variables. NOTE: The PROCEDURE SORT used 1.35 seconds.

***************************************** end of sample code ****************************************

I've rerun this several times. I've even run the two sorts together as follows:

************************************* version 3 -- two sorts together, one eliminates, other doesn't: ************************************** 682 683 data offmastr (keep=copid offname offnum agency); 684 set witness.officer witness.officerj; 685 run;

NOTE: The data set WORK.OFFMASTR has 12581 observations and 4 variables. NOTE: The DATA statement used 1.81 seconds.

686 687 proc sort data=offmastr nodup; 688 by agency; 689 run;

NOTE: 0 duplicate observations were deleted. NOTE: The data set WORK.OFFMASTR has 12581 observations and 4 variables. NOTE: The PROCEDURE SORT used 1.34 seconds.

690 691 proc sort data=offmastr nodup; 692 by copid; 693 run;

NOTE: 4828 duplicate observations were deleted. NOTE: The data set WORK.OFFMASTR has 7753 observations and 4 variables. NOTE: The PROCEDURE SORT used 1.12 seconds.

***************************************** end of sample code *****************************************

I apologize for the length of this posting, but I'm trying to document what's happening in hopes that someone can see what I'm missing.

Any help greatly appreciated.

Dan

Dan Keating Miami Herald (305) 376-3476 -- phone (305) 376-5287 -- fax dtkeats@ibm.net


Back to: Top of message | Previous page | Main SAS-L page