| Date: | Wed, 26 Mar 2008 11:38:45 -0400 |
| Reply-To: | Ed Heaton <EdHeaton@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Ed Heaton <EdHeaton@WESTAT.COM> |
| Subject: | Re: Merging with overlapping non-key variables |
|
| In-Reply-To: | <1ba1507b0803251538kf7f9414if5bb19b9ae1f9534@mail.gmail.com> |
| Content-Type: | text/plain; charset="us-ascii" |
Scott;
You have fell victim to a misconception spread by SAS itself. That
misconception is that values from the second dataset will overwrite
values from the first dataset for all variables in common that are not
specified in the BY statement. It's not that simple. Let's take your
case.
ONE TWO
id overlap id overlap
-> 1 1 -> 2 1
2 2
2 .
2 3
3 1
At the start of the execution phase of the DATA step, the input pointer
is on the first record of each table - as above. The value of ID in ONE
is smaller than the value of ID in TWO, so SAS reads the values from ONE
into the PDV, moves the input pointer to the next row of that dataset,
and writes the values from the PDV to the output dataset.
PDV: id=1 overlap=1
ONE TWO
id overlap id overlap
1 1 -> 2 1
-> 2 2
2 .
2 3
3 1
BOTH
id overlap
1 1
Now, SAS sees that the values for ID are equal.
So, SAS reads the values into the PDV from ONE and moves the input
pointer.
PDV: id=2 overlap=2
Then it reads the values from TWO and the value of OVERLAP from TWO
overwrites the value from ONE. SAS probably moves the input pointer to
the end-of-file marker or some such thing for SAS datasets. At any
rate, SAS will not read from dataset TWO again.
PDV: id=2 overlap=1
ONE TWO
id overlap id overlap
1 1 2 1
2 2 ->
-> 2 .
2 3
3 1
Then it outputs the PDV to the BOTH table.
BOTH
id overlap
1 1
2 1
Now, SAS reads the next row from ONE, sees that the value for ID has not
changed, and overwrites the value for OVERLAP in the PDV with the value
from ONE. Then it moves the input pointer for ONE.
PDV: id=2 overlap=.
ONE TWO
id overlap id overlap
1 1 2 1
2 2 ->
2 .
-> 2 3
3 1
Then it outputs the PDV to the BOTH table.
BOTH
id overlap
1 1
2 1
2 .
Now, SAS reads the next row from ONE, sees that the value for ID has
still not changed, and overwrites the value for OVERLAP in the PDV with
the value from ONE. Then it moves the input pointer for ONE.
PDV: id=2 overlap=3
ONE TWO
id overlap id overlap
1 1 2 1
2 2 ->
2 .
2 3
-> 3 1
Then it outputs the PDV to the BOTH table.
BOTH
id overlap
1 1
2 1
2 .
2 3
Finally, SAS reads the last row from ONE, writes the values from ONE
into the PDV, and moves the input pointer. Id knows that we are at the
end of TWO, so it simply writes out the results.
PDV: id=3 overlap=1
ONE TWO
id overlap id overlap
1 1 2 1
2 2 ->
2 .
2 3
3 1
->
Then it outputs the PDV to the BOTH table.
BOTH
id overlap
1 1
2 1
2 .
2 3
3 1
Done.
Hope this helps ... a little.
Ed
Edward Heaton, Senior Systems Analyst,
Westat (An Employee-Owned Research Corporation),
1650 Research Boulevard, TB-286, Rockville, MD 20850-3195
Voice: (301) 610-4818 Fax: (301) 294-2085
mailto:EdHeaton@Westat.com http://www.Westat.com
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Scott Bucher
Sent: Tuesday, March 25, 2008 6:39 PM
To: SAS-L@listserv.uga.edu
Subject: Merging with overlapping non-key variables
Hi,
I am trying to merge two datasets that have common non-key variables.
For example, merging data set 'one' with data set 'two', I would like
to create 'both'. I would expect the merge statement I have provided to
achieve this, since for non-overlapping variables that is the outcome;
but obviously this does not work. The only literature on overlapping
variables I can find simply says 'do not do it'. I realize this can be
done by creating non-overlapping variables, but is anything more direct
possible?
data one;
input id overlap;
datalines;
1 1
2 2
2 .
2 3
3 1
;
data two;
input id overlap;
datalines;
2 1
;
data both;
input id overlap;
datalines;
1 1
2 1
2 1
2 1
3 1
;
data both;
merge
one
two;
by id;
run;
|