Date: Thu, 14 Oct 2010 08:42:51 -0700
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: Finding duplicate records with non-matching data
Content-Type: text/plain; charset=ISO-8859-1
Keith,
You should have sashelp.class, but here is a version that doesn't need
it. I would run the following code after closing SAS and starting a
brand new session:
/*Generate some test data*/
data All_Bills;
input EffectDate $ ITAmnt BillNumb;
cards;
John 7 1
John 7 1
John 7 1
John 7 1
John 7 1
Joe 12 3
Joe 12 3
Joe 15 3
Harry 13 2
Joe 12 3
Joe 11 3
Mary 14 4
Mary 12 4
Mary 15 4
Mary 16 4
Mary 12 4
Mary 15 4
Mary 11 4
Mary 15 4
;
/*Sort to find duplicates*/
PROC SORT
DATA= All_Bills;
BY EffectDate BillNumb ITAmnt;
RUN;
*Find all duplicates, then all duplicates with unmatched data in
ITAmnt field;
DATA All_Duplicates;
SET All_Bills;
retain hold_ITAmnt;
BY EffectDate BillNumb ITAmnt;
/*find all duplicates*/
IF not(FIRST.BillNumb and LAST.BillNumb) THEN DO;
if first.BillNumb then hold_ITAmnt=ITAmnt;
else if ITAmnt ne hold_ITAmnt then unmatched=1;
OUTPUT;
end;
RUN;
/*Sort to find unmatched*/
PROC SORT
DATA= All_Duplicates;
BY EffectDate BillNumb descending unmatched;
RUN;
DATA UnMatched_Duplicates (drop=
hold: unmatched gotone);
set All_Duplicates;
BY EffectDate BillNumb descending unmatched;
retain gotone;
if first.BillNumb then do;
call missing(gotone);
if unmatched then gotone=1;
end;
if gotone then output;
run;
HTH,
Art
----------
On Oct 14, 9:02 am, Friar Broccoli <elia...@gmail.com> wrote:
> On Oct 13, 6:59 pm, Arthur Tabachneck <art...@netscape.net> wrote:
>
>
>
>
>
> > Friar,
>
> > In answer to your first question/statement, if your code does what you
> > want, excellent! I would think that you need an extra data step,
> > e.g.:
>
> > /*Generate some test data*/
> > data summ_lib.All_Bills;
> > set sashelp.class (drop=sex height weight
> > rename=(
> > name=EffectDate
> > age=ITAmnt
> > ));
> > if _n_ < 6 then do;
> > EffectDate="John";
> > BillNumb=1;
> > ITAmnt=7;
> > end;
> > else if _n_ eq 9 then do;
> > EffectDate="Harry";
> > BillNumb=2;
> > end;
> > else if _n_ < 12 then do;
> > EffectDate="Joe";
> > BillNumb=3;
> > end;
> > else do;
> > EffectDate="Mary";
> > BillNumb=4;
> > end;
> > run;
>
> > /*Sort to find duplicates*/
> > PROC SORT
> > DATA= SUMM_LIB.All_Bills
> > OUT = SUMM_LIB.All_Bills;
> > BY EffectDate BillNumb ITAmnt;
> > RUN;
>
> > *Find all duplicates, then all duplicates with unmatched data in
> > ITAmnt field;
> > DATA SUMM_LIB.All_Duplicates;
> > SET SUMM_LIB.All_Bills;
> > retain hold_ITAmnt;
> > BY EffectDate BillNumb ITAmnt;
> > /*find all duplicates*/
> > IF not(FIRST.BillNumb and LAST.BillNumb) THEN DO;
> > if first.BillNumb then hold_ITAmnt=ITAmnt;
> > else if ITAmnt ne hold_ITAmnt then unmatched=1;
> > OUTPUT;
> > end;
> > RUN;
>
> > /*Sort to find unmatched*/
> > PROC SORT
> > DATA= SUMM_LIB.All_Duplicates
> > OUT = SUMM_LIB.All_Duplicates;
> > BY EffectDate BillNumb descending unmatched;
> > RUN;
>
> > DATA SUMM_LIB.UnMatched_Duplicates (drop=
> > hold: unmatched gotone);
> > set SUMM_LIB.All_Duplicates;
> > BY EffectDate BillNumb descending unmatched;
> > retain gotone;
> > if first.BillNumb then do;
> > call missing(gotone);
> > if unmatched then gotone=1;
> > end;
> > if gotone then output;
> > run;
>
> Tried to run your code (in many different ways) but got (variations
> on):
>
> VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
> 2078
> 2079 /
> *-------------------------------------------------------------
> 2080 SUMMARY REPORTS FOR ALL DAYS.
> 2081
> -------------------------------------------------------------*/
> 2082 TITLE;
> 2083 *MARKER;
> 2084
> 2085 data summ_lib.All_Bills;
> 2086 set sashelp.class (drop=sex height
> weight rename=(name=EffectDate age=ITAmnt));
> ____
> ____
> ____
> 180
> 180
> 180
> 2087 if _n_ < 6 then do;
> _
> _
> _
> 180
> 180
> 180
> 2088 EffectDate="John";
> _
> _
> _
> 180
> 180
> 180
> 2089 BillNumb=1;
> _
> _
> _
> 180
> 180
> 180
> 2090 ITAmnt=7;
> _
> _
> _
> 180
> 180
> 180
> 2091 end;
> _
> _
> _
> 180
> 180
> 180
> 2092 else if _n_ eq 9 then do;
> _
> _
> _
> 180
> 180
> 180
> 2093 EffectDate="Harry";
> _
> _
> _
> 180
> 180
> 180
> 2094 BillNumb=2;
> _
> _
> _
> 180
> 180
> 180
> 2095 end;
> _
> _
> _
> 180
> 180
> 180
> 2096 else if _n_ < 12 then do;
> _
> _
> _
> 180
> 180
> 180
> 2097 EffectDate="Joe";
> _
> _
> _
> 180
> 180
> 180
> 2098 BillNumb=3;
> _
> _
> _
> 180
> 180
> 180
> 2099 end;
> _
> _
> _
> 180
> 180
> 180
> 2100 else do;
> _
> _
> _
> 180
> 180
> 180
> 2101 EffectDate="Mary";
> _
> _
> _
> 180
> 180
> 180
> 2102 BillNumb=4;
> _
> _
> _
> 180
> 180
> 180
> 2103 end;
> _
> _
> _
> 180
> 180
> 180
> ERROR 180-322: Statement is not valid or it is used out of proper
> order.
> ERROR 180-322: Statement is not valid or it is used out of proper
> order.
> ERROR 180-322: Statement is not valid or it is used out of proper
> order.
>
> 2104 run;
>
> NOTE: The SAS System stopped processing this step because of errors.
>
> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>
> I assume I do not have "sashelp.class". I spent some time trying to
> figure out what it was so I could get it, but found nothing.
>
> Note that I do not run inside the GUI and indeed have little idea of
> how to use it, which may be my problem - not sure.
>
>
>
>
>
> > Your second question is a bit more difficult, thus I'm cc'ing SAS-L so
> > that Joe can respond to you directly. As far as I know, comp.soft-
> > sys.sas and SA...@LISTSERV.UGA.EDU haven't been on speaking terms for
> > almost a year now. I do know that Joe is looking into it but, as yet,
> > we're still separated. I don't think that the term "divorce" has been
> > used yet!
>
> > Art
> > -----------
> > On Oct 13, 7:42 am, Friar Broccoli <elia...@gmail.com> wrote:
>
> > > For years I have been looking for a simple way to
> > > find all duplicate records with non-matching data.
> > > I just tripped over the solution which follows:
>
> > > VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
>
> > > /* Sort for merge */
> > > PROC SORT
> > > DATA= SUMM_LIB.All_Bills
> > > OUT = SUMM_LIB.All_Bills;
> > > BY EffectDate BillNumb ITAmnt;
> > > RUN;
>
> > > *Find all duplicates, then all duplicates with unmatched data in
> > > ITAmnt field;
> > > DATA SUMM_LIB.All_Duplicates
> > > SUMM_LIB.UnMatched_Duplicates;
> > > SET SUMM_LIB.All_Bills;
> > > BY EffectDate BillNumb ITAmnt;
> > > IF ((FIRST.BillNumb ^= 1) OR (LAST.BillNumb ^= 1)) THEN
> > > DO;
> > > OUTPUT SUMM_LIB.All_Duplicates;
> > > IF ((FIRST.ITAmnt = 1) AND (LAST.ITAmnt = 1)) THEN OUTPUT
> > > SUMM_LIB.UnMatched_Duplicates;
> > > END;
> > > RUN;
>
> > > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>
> > > Can anybody tell me why the SAS group is no longer echoed to and from
> > > usenet?- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
|