Date: Tue, 21 Aug 2007 09:44:02 -0400
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: Difference between two daily datasets
In-Reply-To: <200708202141.l7KIGZ0Z012804@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Chandra,
If the yesterday data set can have only unique IDs then a SORT free solution
can be achieved through the use of KEY INDEXING. It is assumed that num can
be changed to number to use Key Indexing. The array size can be altered to
suit the maximum of 'num' in the yesterday data set. It can be passed as
macro variable as well. This method will yield the results with the least
processing time compared to any other method.
data wanted;
array k[9999] $ _temporary_;
retain num cattoday catysday; * to keep the order;
do until(eofy);
set yesterday end = eofy;
nu = input(num, 8.);
k[nu] = catysday;
end;
do until (eoft);
set today end = eoft;
nu = input(num, 8.);
if k[nu] ne cattoday then do;
catysday = k[nu];
output;
end;
end;
drop nu;
run;
The traditional data step approach would involve SORTING the data sets and
using POINT= option.
proc sort data = yesterday nodupkey;
by num catysday;
run;
proc sort data = today nodupkey;
by num cattoday;
run;
data wanted;
do until(eoft);
set today end = eoft;
ind = 0;
do p = 1 to numt;
set yesterday (rename=(num=newnum)) nobs = numt point = p;
if num = newnum then do;
ind = 1;
if catysday ne cattoday then do;
output;
leave;
end;
end;
end;
if ind = 0 then do;
catysday = ' ';
output;
end;
end;
stop;
drop ind newnum;
run;
Regards,
Muthia Kachirayan
On 8/20/07, SUBSCRIBE SAS-L Chandra Gadde <ddraj2015@gmail.com> wrote:
>
> Hi All,
>
> I have two data sets here. One is today's data set and another one is
> yesterday's data set. Today's dataset always contain all the observations
> from yesterday and in some cases, more observations than yesterday's
> dataset. I need to find the diffrences between these. Please find an
> example below.
>
>
> data yesterday;
> input num $ catysday $;
> cards;
> 123 dea
> 215 iko
> 543 ijn
> 123 dea
> 254 oij
> 198 mkh
> 215 iko
> 543 ijn
> ;
> run;
>
> data today;
> input num $ cattoday $;
> cards;
> 123 dea
> 215 iko
> 543 ijn
> 123 dea
> 123 dca
> 254 oij
> 198 mkh
> 215 iko
> 543 ijn
> 978 kkk
> 110 pol
> 215 plm
> ;
> run;
>
> Now, I need an ouput that looks like this below. I should get all the
> records that are in today's dataset and not in yesterday's dataset and if
> there are any changes in cattoday versus catysday, I also want that to be
> in my final dataset. Please help me how to do this.
>
> num cattoday catysday
> 123 dca dea
> 215 plm iko
> 978 kkk
> 110 pol
>