Date: Fri, 10 May 1996 15:41:34 +0200
Reply-To: Gordon Meyer <meyer_g@MTN.CO.ZA>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Gordon Meyer <meyer_g@MTN.CO.ZA>
Organization: MTN
Subject: Re: Data step programming problem - Help!
Andrew Cosmatos wrote:
>
> Tadd Clayton wrote:
> >
> > Hi everyone
> >
> > I have a data set in which each observation should be uniquely
> > identified by a combination of two variables - a school number
> > and a serial number within each school. However, due to problems
> > with the data entry process, there are a number of observations
> > with duplicate values for the combination of school and serial
> > numbers.
> >
> > I would like to be able to compare the school and serial numbers
> > for each observation with those from the previous observation and
> > then output *both* observations to a data set if they are
> > duplicated. I can use a retain statement to define variables
> > that will carry the school and serial numbers over iterations of
> > the data step to allow the comparison but, as SAS appears to work
> > on an observation by observation basis only, I can't figure out
> > how to output both observations. Can anyone offer a simple
> > solution?
> >
> > Thanks for any help.
> >
> > Tadd
> >
> > --
> > Tadd Clayton Ph: 64 9 373 7599 ext. 6451
> > Research Officer Fax: 64 9 373 7486
> > Department of Paediatrics Email: t.clayton@auckland.ac.nz
> > School of Medicine
> > University of Auckland
> > Private Bag 92019
> > Auckland
> > NEW ZEALANDTadd,
>
> I had a similar problem I solved it in the following manner:
>
> data tmp;
> set school;
> x=1;
> run;
>
> proc sort data=tmp;
> by schoolno serialno;
> run;
>
> proc means data=tmp noprint;
> by schoolno serialno;
> var x;
> output out=tmp1 sum=;
> id x y z;
> run;
>
> data dups;
> set tmp1;
> if x>1;
> run;
>
> In the data set dups the duplicates will be listed and how many times
> they were duplicated would be stored in var. x.
>
> Andrew.
The above code is correct, except that the id statement should include
only variables that are NOT used in the VAR or BY statements i.e. x in
this case.
Gordon
South Africa
|