LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 13 Mar 2008 06:57:16 -0700
Reply-To:     Dirk Nachbar <dirknbr@GOOGLEMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Dirk Nachbar <dirknbr@GOOGLEMAIL.COM>
Organization: http://groups.google.com
Subject:      Re: proc SQL join / datastep merge problem
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset=ISO-8859-1

It's a good description but still hard to understand, I think you have to do iterative merges. The merge without in or if creates all rows possible.

data3; merge data1 data2 (keep=a b d); by a b; run;

sort again;

data3; merge data3 (rename=(c=b)) data2 (keep=b e f); by c; run;

andsoforth ...

Dirk Nachbar

On Mar 13, 11:17 am, antti.pasa...@STAT.FI (Antti Pasanen) wrote: > Hi all, > > I have a proc SQL / datastep merge problem (I'm not sure which will lead to > a more sophisticated solution). The problem is, that I have two 'big' > enterprise level datasets the first having about 70 variables and the > second having about 300 variables. The first data having more rows is done > of the second dataset by splitting up industries to local kind of units. > So, the first data has some of the variables from the second data, but not > all and some additional variables. > > What I need is to merge/join these two data's so that from the first data I > will take all the variables and rows, and from the second data I have to > add the rest of the variables to the new data (to the rows that have a > largest value in some indicator, C here) and possibly also rows that are > not in the first data. An example what I'm trying to do: > > Data 1: > A B C D > 1 1 2 10 > 1 1 3 12 > 1 2 1 8 > 1 2 2 4 > 2 4 5 5 > 2 4 2 20 > 2 4 4 11 > 4 1 3 4 > 4 2 4 6 > > Data 2: > A B D E F > 1 2 34 3 5 > 3 2 14 5 7 > 4 4 10 3 1 > 5 4 1 0 1 > > Output data 3: > > A B C D E F > 1 1 2 10 > 1 1 3 12 > 1 2 1 8 > 1 2 2 4 2 7 > 2 4 5 5 > 2 4 2 20 > 2 4 4 11 > 3 2 14 5 7 > 4 1 3 4 > 4 1 4 6 3 1 > 5 4 1 0 1 > > 1. So, I need to keep the data from Data 1 and add the data from Data 2 if > such rows do not exist in Data 1 (A=3 and A=5). > > 2. And add to the end of Data 1 the variables that do not exist in Data 1 > (E and F) to the biggest C variable row having the same class (A and B). > > 3. In the case that there is no common class B in the two datasets (A=4) > then the additional variables should be added to the biggest class C having > still the same A group. > > Always I get the variable D from data 2 mixed to the variable C in data 1 > and the E and F variables multiplied. Whether I'm using merge or SQL and in > the solution I don't want to write all the variables in the SQL select row! > > I hope my question was clear enough and I appreciate all the suggestions I > get. > > Best, > Antti


Back to: Top of message | Previous page | Main SAS-L page