Date: Thu, 23 Aug 2007 11:21:39 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: which to use, Proc SQL or Proc Compare
On Wed, 22 Aug 2007 22:17:09 +0300, Johanna <jonna@TUT.NOT.SE> wrote:
>below is a small sample of a process that runs a comparison to detect
>changes to some data, every week.
>
>the question I had was... which is better to track changes, Proc Sql or Proc
>Compare, when you consider that I'd like to know what part of the data
>changed
>
>below, Mary Smith's hours changed and Bob Murdoch is a new person but Proc
>Compare won't tell me why the person is listed, just that something changed
>
>in the real world, I have a dataset with about 20 variables and any one of
>them could change week over week, without having to analyze each separately,
>is there an elegant way to detect "What's new this week?"
>
>the only condition I don't need to track is if the person does not exist the
>next week, ie. dropped, but everything else I would need to be aware of, ie.
>"Mary Smith's HOURS changed" and "Bob Murdoch is a NEW item", also the code
>could have changed but not in my sample below.
>
>data base;input id hours code $3. name $20.;
>cards;
>001 40 ABB Tom Jones
>002 20 DBF Mary Smith
>;
>run;
>data week;input id hours code $3. name $20.;
>cards;
>001 40 ABB Tom Jones
>002 40 DBF Mary Smith
>003 40 ABB Bob Murdoch
>;
>run;
>proc compare base=base compare=week out=result(where=(_type_="COMPARE"))
> outnoequal outbase outcomp outdif noprint;id id;
> run;
>data _null_;set ;put(_all_)(=);run;
I would not use either COMPARE or SQL for this. Here's an alternative ...
Transpose the BASE table into the form of an item list, and sort it:
proc transpose data=base out=longbase
(rename = (_name_=item col1=base) where=(item ne 'id') );
by id name;
var _all_;
run;
proc sort data=longbase;
by id item;
run;
Do the same with the WEEK table:
proc transpose data=week out=longweek
(rename = (_name_=item col1=week) where=(item ne 'id') );
by id name;
var _all_;
run;
proc sort data=longweek;
by id item;
run;
Now merge at the item level and filter out the matching values:
data compared;
merge longbase longweek(rename = (name=newname) );
by id item;
if base=week then delete;
name = coalescec(name,newname);
drop newname;
base = left(base);
week = left(week);
run;
This could be done with SQL, in which case, the PROC SORT steps would not be
needed.
Report:
proc print data=compared;
by id name;
id id name;
run;
Output looks like this:
id name item base week
2 Mary Smith hours 20 40
3 Bob Murdoch code ABB
hours 40
name Bob Murdoch