Date: Thu, 22 May 2008 15:40:50 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: Identifying the same pair in different order on fields
Content-Type: text/plain; charset="iso-8859-1"
And this tells me how to fix all the data to get the smallest one to come first each time. Another brilliant SAS-L contributor!
These ID's identify blood samples from patients and sometimes the same patient gives blood again without us realizing until later that we already have an ID for them in our system, so I need to be able to identify which ones are for the same patient because I don't want to include the same patient twice in statistical analysis. Usually there are the same results, but I need to check first, then aggregate them together before running my analysis.
Thanks!
-Mary
----- Original Message -----
From: Chang Chung
To: SAS-L@LISTSERV.UGA.EDU
Sent: Thursday, May 22, 2008 3:14 PM
Subject: Re: Identifying the same pair in different order on fields
On Thu, 22 May 2008 14:47:18 -0500, Mary <mlhoward@AVALON.NET> wrote:
>Hi,
>
>I've got two fields; ID and ALT_ID. Here is some example data:
>
>001-00 004-00
>099-00 001-01
>004-00 001-00
>100-00 101-00
>102-00 104-01
>101-00 100-00
>102-00 104-01
>
>What I'd like to pick out are the rows in which the same pair of ID and
ALT-ID occurs, but where they occur in different order
>
>Thus my result desired from the data above is:
>
>001-00 004-00
>004-00 001-00
>
>100-00 101-00
>101-00 100-00
>
>My desire is to pick out the ones where the ID and the Alt-ID aren't in the
same order, so I can correct those in the data and put them in the same
order, and thus correctly identify duplicate rows in the data.
>
>Thanks!
>
>-Mary
hi,
I would consider "canonizing" id and alt_id, so that id always compares
smaller than alt_id. If you have to keep the ids as they are, then create a
new id and make sure that it has the same id value if their canonized
id-alt_id matches. a self-join can be disastrous when there are the
id-alt_id in-order duplicates.
cheers,
chang
/* test data */
data one;
input (id alt_id) (:$6.);
cards;
001-00 004-00
099-00 001-01
004-00 001-00
100-00 101-00
102-00 104-01
101-00 100-00
102-00 104-01
;
run;
data two;
set one;
/* canonized ids -- make id always smaller than alt_id */
if id > alt_id then do;
temp = id;
id = alt_id;
alt_id = temp;
drop temp;
end;
run;
proc sort data=two;
by id alt_id;
run;
proc print data=two;
run;
/* on lst
Obs id alt_id
1 001-00 004-00
2 001-00 004-00
3 001-01 099-00
4 100-00 101-00
5 100-00 101-00
6 102-00 104-01
7 102-00 104-01
*/