LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 26 Mar 2008 11:38:45 -0400
Reply-To:   Ed Heaton <EdHeaton@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ed Heaton <EdHeaton@WESTAT.COM>
Subject:   Re: Merging with overlapping non-key variables
Comments:   To: Scott Bucher <ir.bucher@gmail.com>
In-Reply-To:   <1ba1507b0803251538kf7f9414if5bb19b9ae1f9534@mail.gmail.com>
Content-Type:   text/plain; charset="us-ascii"

Scott;

You have fell victim to a misconception spread by SAS itself. That misconception is that values from the second dataset will overwrite values from the first dataset for all variables in common that are not specified in the BY statement. It's not that simple. Let's take your case.

ONE TWO id overlap id overlap -> 1 1 -> 2 1 2 2 2 . 2 3 3 1

At the start of the execution phase of the DATA step, the input pointer is on the first record of each table - as above. The value of ID in ONE is smaller than the value of ID in TWO, so SAS reads the values from ONE into the PDV, moves the input pointer to the next row of that dataset, and writes the values from the PDV to the output dataset.

PDV: id=1 overlap=1

ONE TWO id overlap id overlap 1 1 -> 2 1 -> 2 2 2 . 2 3 3 1

BOTH id overlap 1 1

Now, SAS sees that the values for ID are equal.

So, SAS reads the values into the PDV from ONE and moves the input pointer.

PDV: id=2 overlap=2

Then it reads the values from TWO and the value of OVERLAP from TWO overwrites the value from ONE. SAS probably moves the input pointer to the end-of-file marker or some such thing for SAS datasets. At any rate, SAS will not read from dataset TWO again.

PDV: id=2 overlap=1

ONE TWO id overlap id overlap 1 1 2 1 2 2 -> -> 2 . 2 3 3 1

Then it outputs the PDV to the BOTH table.

BOTH id overlap 1 1 2 1

Now, SAS reads the next row from ONE, sees that the value for ID has not changed, and overwrites the value for OVERLAP in the PDV with the value from ONE. Then it moves the input pointer for ONE.

PDV: id=2 overlap=.

ONE TWO id overlap id overlap 1 1 2 1 2 2 -> 2 . -> 2 3 3 1

Then it outputs the PDV to the BOTH table.

BOTH id overlap 1 1 2 1 2 .

Now, SAS reads the next row from ONE, sees that the value for ID has still not changed, and overwrites the value for OVERLAP in the PDV with the value from ONE. Then it moves the input pointer for ONE.

PDV: id=2 overlap=3

ONE TWO id overlap id overlap 1 1 2 1 2 2 -> 2 . 2 3 -> 3 1

Then it outputs the PDV to the BOTH table.

BOTH id overlap 1 1 2 1 2 . 2 3

Finally, SAS reads the last row from ONE, writes the values from ONE into the PDV, and moves the input pointer. Id knows that we are at the end of TWO, so it simply writes out the results.

PDV: id=3 overlap=1

ONE TWO id overlap id overlap 1 1 2 1 2 2 -> 2 . 2 3 3 1 ->

Then it outputs the PDV to the BOTH table.

BOTH id overlap 1 1 2 1 2 . 2 3 3 1

Done.

Hope this helps ... a little.

Ed

Edward Heaton, Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1650 Research Boulevard, TB-286, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 294-2085 mailto:EdHeaton@Westat.com http://www.Westat.com

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Scott Bucher Sent: Tuesday, March 25, 2008 6:39 PM To: SAS-L@listserv.uga.edu Subject: Merging with overlapping non-key variables

Hi,

I am trying to merge two datasets that have common non-key variables. For example, merging data set 'one' with data set 'two', I would like to create 'both'. I would expect the merge statement I have provided to achieve this, since for non-overlapping variables that is the outcome; but obviously this does not work. The only literature on overlapping variables I can find simply says 'do not do it'. I realize this can be done by creating non-overlapping variables, but is anything more direct possible?

data one; input id overlap; datalines; 1 1 2 2 2 . 2 3 3 1 ;

data two; input id overlap; datalines; 2 1 ;

data both; input id overlap; datalines; 1 1 2 1 2 1 2 1 3 1 ;

data both; merge one two; by id; run;


Back to: Top of message | Previous page | Main SAS-L page