LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2000, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 26 Feb 2000 02:28:49 GMT
Reply-To:     Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Organization: @Home Network
Subject:      Re: How do you check for duplicatess?

Ok, there've been a bunch of replies pointing out the missing BY statement.

But you don't necessarily need to have a LONGDATA variable which is a concatenation of all the variables. You could code look something like

proc sort data = inld7.all; by _all_; data inld7.dups inld7.nodups; set inld7.all; by _all_; if first.lastvar then output inld7.nodups; else output inld8.dups; run;

SORTing and SETing by _ALL_ save you the effort of coding a LONGVAR, and the space it takes. Additionally, if the total length of all your concatenated variables is exceeds the maximum length of a character variable, you need more than one, while using _all_ simply uses all the variables.

A few words about LASTVAR in the data step. Variables are in a data set in the order they are established - the first variable set up is variable 1, the second is variable 2, etc. You can usually figure out which variable is the last one on an observation by looking at the code, if you have it (sometimes you get data sets from someone/somewhere else, and don't have the code). You can also open a VAR window on the data set, and the variables are listed in their order of appearance. Or you can run a PROC CONTENTS on the data set and see the order there. In any case, you'd use the name of the last variable to appear on an observation, not "lastvar".

Maurice Muoneke wrote in message <893rq5$cso$>... >I am trying to use the code below to detect duplicate records. The field >'longdata' is a concatenation of all the variables. I know there are >duplicates, but when I run it, I get all the records in the duplicates file >and no records in the other file. What is wrong? > > > >proc sort data = inld7.all; by longdata; run; >data INLD7.dups INLD7.nodups ; SET INLD7.ALL; > if first.longdata then > output INLD7.nodups; > else > output INLD7.dups; >run; > >

Back to: Top of message | Previous page | Main SAS-L page