LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2007, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 6 Sep 2007 11:34:34 -0400
Reply-To:     Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:      Re: Merge question
In-Reply-To:  <1189024927.396396.286280@w3g2000hsg.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1

On 9/5/07, olana98@googlemail.com <olana98@googlemail.com> wrote: > > Dear SAS Group, > > I wanted to merge two data sets. One contains patient information- > date of hospital visit and others. Commonly, there are several daily > visits leading to multiple observations in a day. The second data set > contains an exposure parameter measured on the day of visit and > several controls spaced weeks apart. The plan is to merge each > hospital visit to several exposure dates. I have copied a sample data > set. The ID is not unique in the data set as a patient may visit a > hospital more than once. It appears it is many to many merging and I > attempt with ordinary merging did not produce the required result.Any > suggestion is appreciated. Thanks and best wishes > > Data set one > ID Visit > 19 31/07/2000 > 41 02/08/2000 > 52 02/08/2000 > 19 14/08/2000 > > Data set two > > Ref_date Grp What I wanted to achieve > > 31/07/2000 1 19 31/07/2000 1 > 24/07/2000 0 19 24/07/2000 0 > 17/07/2000 0 19 17/07/2000 0 > 10/07/2000 0 19 10/07/2000 0 > 03/07/2000 0 19 03/07/2000 0 > 02/08/2000 1 41 02/08/2000 1 > 09/08/2000 0 41 09/08/2000 0 > 16/08/2000 0 41 16/08/2000 0 > 23/08/2000 0 41 23/08/2000 0 > 30/08/2000 0 41 30/08/2000 0 > 14/08/2000 1 52 02/08/2000 1 > 07/08/2000 0 52 09/08/2000 0 > 21/08/2000 0 52 16/08/2000 0 > 28/08/2000 0 52 23/08/2000 0 > 52 30/08/2000 0 > 19 14/08/2000 1 > 19 07/08/2000 0 > 19 21/08/2000 0 > 19 28/08/2000 0

It is possible to do this in one data step without either Sorting or Transposing. The algorithm is simple. The direct addressing to a record starting with grp = 1 and then to match the VISIT with REF_DATE will produce the desired output.

The data sets used were:

data one; input ID visit :ddmmyy10.; cards; 19 31/07/2000 41 02/08/2000 52 02/08/2000 19 14/08/2000 ; run;

data two; input ref_date :ddmmyy10. grp; cards; 31/07/2000 1 24/07/2000 0 17/07/2000 0 10/07/2000 0 03/07/2000 0 02/08/2000 1 09/08/2000 0 16/08/2000 0 23/08/2000 0 30/08/2000 0 14/08/2000 1 07/08/2000 0 21/08/2000 0 28/08/2000 0 ; run;

The code with some comments follow:

%let cases = 3;

41 data three; 42 length id ref_date grp 8; 43 array k[&cases] _temporary_; 44 if _n_ = 1 then do; 45 case = 1; 46 do _n_ = 1 by 1 until(eof2); 47 set two nobs = totobs end = eof2; 48 if grp = 1 then do; 49 k[case] = _n_; 50 case ++ 1; 51 end; 52 end; 53 end;

The use of direct accessing with the use of POINT = option looks up a record by record from the beginning of data set (TWO) until a match is obtained. This would be an unwanted I/O. A clever idea is to remember the record id (RID) when it has grp = 1 and lookup from there whether there is a match or not, which would eliminate unnecessary I/O travels. The number of grp is assumed to be 3 in this case, and is passed as a macro variable, CASES. Hence, in one pass of the data set(TWO), the array k[] is loaded with this information. At the completion of this loop the array will have:

k[1] = 1 k[2] = 6 and k[3] = 11

As we read records from the data set, ONE, we need to go to records 1 or 6 or 11 to find the match.

54 do until(endone); 55 set one end = endone; 56 do _n_ = 1 to case - 1 ; 57 p = k[_n_]; 58 link getnext; 59 done = 0; 60 if visit = ref_date then do; 61 output; 62 do p = p + 1 by 1 until(done | p ge totobs); 63 link getnext; 64 if grp = 0 then output; 65 else done = 1; 66 end; 67 if done then leave; 68 end; 69 if done then leave; 70 end; 71 end; 72 return; 73 getnext: set two point = p; return; 74 stop; 75 drop case done visit; 76 run;

The statement at line 62 is very important to be commented.

do p = p + 1 by 1 until(done | p ge totobs);

When we use POINT= option, there would be a potential infinite-loop. In this code we stop it by checking when DONE becomes 1 or P reaches the end of the file(number of records). If the latter condition is missed, there would be endless loop !

Two LEAVE statements have been used to go back to read the next observation from the data set, ONE. Even if these statements are not there, it will work with more I/O as

p ge totobs

helps to complete the do-loop and send the control back to read the next observation in ONE.

Hope you enjoy this code.

Muthia Kachirayan


Back to: Top of message | Previous page | Main SAS-L page