Date: Thu, 13 Mar 2008 06:57:16 -0700
Reply-To: Dirk Nachbar <dirknbr@GOOGLEMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dirk Nachbar <dirknbr@GOOGLEMAIL.COM>
Organization: http://groups.google.com
Subject: Re: proc SQL join / datastep merge problem
Content-Type: text/plain; charset=ISO-8859-1
It's a good description but still hard to understand, I think you have
to do iterative merges. The merge without in or if creates all rows
possible.
data3;
merge data1 data2 (keep=a b d);
by a b;
run;
sort again;
data3;
merge data3 (rename=(c=b)) data2 (keep=b e f);
by c;
run;
andsoforth ...
Dirk Nachbar
On Mar 13, 11:17 am, antti.pasa...@STAT.FI (Antti Pasanen) wrote:
> Hi all,
>
> I have a proc SQL / datastep merge problem (I'm not sure which will lead to
> a more sophisticated solution). The problem is, that I have two 'big'
> enterprise level datasets the first having about 70 variables and the
> second having about 300 variables. The first data having more rows is done
> of the second dataset by splitting up industries to local kind of units.
> So, the first data has some of the variables from the second data, but not
> all and some additional variables.
>
> What I need is to merge/join these two data's so that from the first data I
> will take all the variables and rows, and from the second data I have to
> add the rest of the variables to the new data (to the rows that have a
> largest value in some indicator, C here) and possibly also rows that are
> not in the first data. An example what I'm trying to do:
>
> Data 1:
> A B C D
> 1 1 2 10
> 1 1 3 12
> 1 2 1 8
> 1 2 2 4
> 2 4 5 5
> 2 4 2 20
> 2 4 4 11
> 4 1 3 4
> 4 2 4 6
>
> Data 2:
> A B D E F
> 1 2 34 3 5
> 3 2 14 5 7
> 4 4 10 3 1
> 5 4 1 0 1
>
> Output data 3:
>
> A B C D E F
> 1 1 2 10
> 1 1 3 12
> 1 2 1 8
> 1 2 2 4 2 7
> 2 4 5 5
> 2 4 2 20
> 2 4 4 11
> 3 2 14 5 7
> 4 1 3 4
> 4 1 4 6 3 1
> 5 4 1 0 1
>
> 1. So, I need to keep the data from Data 1 and add the data from Data 2 if
> such rows do not exist in Data 1 (A=3 and A=5).
>
> 2. And add to the end of Data 1 the variables that do not exist in Data 1
> (E and F) to the biggest C variable row having the same class (A and B).
>
> 3. In the case that there is no common class B in the two datasets (A=4)
> then the additional variables should be added to the biggest class C having
> still the same A group.
>
> Always I get the variable D from data 2 mixed to the variable C in data 1
> and the E and F variables multiplied. Whether I'm using merge or SQL and in
> the solution I don't want to write all the variables in the SQL select row!
>
> I hope my question was clear enough and I appreciate all the suggestions I
> get.
>
> Best,
> Antti
|