Date: Thu, 29 Jan 2004 08:02:31 -0500
Reply-To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject: Re: Interleaving Observations
Here's what I would try:
data justkeys / view=justkeys;
merge x(in=inx keep=recno)
y(in=iny keep=recno);
by recno;
retain xkey ykey;
if inx then xkey=recno;
if iny then ykey=recno;
run;
data halfdone / view=halfdone;
merge justkeys x(rename=(recno=xkey) );
by xkey;
drop xkey;
run;
data alldone;
merge halfdone y(rename=(recno=ykey) );
by ykey;
drop ykey;
run;
The first step is the critical one. It takes care of the RETAIN
requirement, but does so in terms of the key only. JUSTKEYS looks like this:
recno xkey ykey
1 1 .
2 1 2
3 1 3
4 4 3
5 4 5
The other steps do fairly straightforward merges to pick up the satellite
variables.
Although there are three DATA steps, using views for the intermediate
results saves I/O. Basically, when the last step runs, each of the two
input datasets is read twice, once to set up the keys an once to feed the
rest of the data.
Notice that there are three successive merges, using three different BY
variables, with no intervening sorts. You don't see that every day.
On Wed, 28 Jan 2004 10:28:10 -0600, Kevin Myers <kevinmyers@AUSTIN.RR.COM>
wrote:
>P.S. - The recno values won't necessarily be contiguous nor start at one.
Both data sets are sorted by recno, and there will be no duplicate values
of recno in the two data sets.
>
>----- Original Message -----
>From: "Kevin Myers" <kevinmyers@AUSTIN.RR.COM>
>Newsgroups: bit.listserv.sas-l
>To: <SAS-L@LISTSERV.UGA.EDU>
>Sent: Wednesday, January 28, 2004 10:19 AM
>Subject: Interleaving Observations
>
>
>> Howdy folks,
>>
>> Have two data sets with one common variable that records from the two
data sets need to be interleaved by, plus lots of additional variables that
*differ* between the two data sets. Conceptually what I want to do is
illustrated by the following examples:
>>
>> data x;
>> recno=1; x=1; output;
>> recno=4; x=4; output;
>> run;
>>
>> data y;
>> recno=2; y=2; output;
>> recno=3; y=3; output;
>> recno=5; y=5; output;
>> run;
>>
>> /* neither of the following steps do what I need, but both are sorta
close in different ways */
>>
>> /* ignores BY statement, fails to interleave records in desired order,
quits when first data set runs out of records */
>> data _null_; set x; set y; by recno; put recno= x= y=; run;
>>
>> /* interleaves records in desired order, but fails to retain values */
>> data _null_; retain _all_; set x y; by recno; put recno= x= y=; run;
>>
>> /* desired output */
>> recno=1 x=1 y=.
>> recno=2 x=1 y=2
>> recno=3 x=1 y=3
>> recno=4 x=4 y=3
>> recno=5 x=4 y=5
>>
>> Conceivably I could achieve the desired result using a very inefficient
SQL join, but I'm actually working with *lots* of data, and need a more
efficient solution. Also, I have thought of several multi-step solutions,
but all seemed ugly and unnecessarily complex. I'd strongly prefer a
simple and elegant solution, as what I'm really trying to accomplish here
is to replace and greatly simplify some very complicated existing code.
>>
>> Thanks in advance,
>> s/KAM
>>
|