LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 1999, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 14 Mar 1999 13:09:37 -0500
Reply-To:     hermans1 <hermans1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         hermans1 <hermans1@WESTAT.COM>
Subject:      Re: SQL: One more related question: SQL question Cartesian P
Comments: To: "Seltzer; Jon D." <SeltzerJD@PHIBRED.COM>
Content-Type: text/plain; charset=US-ASCII

Using the collating sequence of the gene variable, try

proc sql; create table data3 as select d1.gene1 as gene1,d1.marke1 as marke1,d1.alle1 as alle1, d2.gene1 as gene2,d2.marke1 as marke2,d2.alle1 as alle2 from data1 d1,data1 d2 where d1.marke1=d2.marke1 and d1.gene1<d2.gene1 order by marke1,gene2 ; quit;

Note that I am using a reflexive join on one table rather than copying the table prior to the join. I believe that the solution matches what you want, except perhaps the order of the last couple of rows. Sig ______________________________ Reply Separator _________________________________ Subject: SQL: One more related question: SQL question Cartesian Produ Author: "Seltzer; Jon D." <SeltzerJD@PHIBRED.COM> at Internet-E-Mail Date: 3/14/99 1:39 AM

First my thanks to everyone that responded to my original question, you are all helping me get out of a monster programming jam.

All of the solutions were orders of magnitude more efficient than what I was trying to do. Hopefully I am on my way to writing a program that takes 1/2hour to run Vs 40 hours to run.

Below is my original question. In working on this problem a bit more, I suddenly realized in reality I want to get only one record from a representative pair from the output at the bottom of this email. I thought all pairs would be okay but now I realize that is not the case.

Once again any help would be greatly appreciated.

So now the final data set I would like to obtain is:

gene1 mark1 alle1 gene2 mark2 alle2 A 1 .1 B 1 .4

A 1 .1 C 1 .7

B 1 .4 C 1 .7

A 2 .2 B 2 .5

A 2 .2 C 2 .8

B 2 .5 C 2 .8

A 3 .3 B 3 .6

A 3 .3 C 3 .9

B 3 .6 C 3 .9

If you compare this to the past final data set below you will notice that the second record has been deleted in each pair of records. I really have no clue how to do this without lots of data manipulation. The actual file I am working on is 2-4 million records, which is why I am looking for more efficient code.

The original problem.

Suppose data1 is:

gene1 marke1 alle1 A 1 .1 A 2 .2 A 3 .3 B 1 .4 B 2 .5 B 3 .6 C 1 .7 C 2 .8 C 3 .9

Data2 is the same as data1 but with different variable names.

gene2 marke2 alle2 A 1 .1 A 2 .2 A 3 .3 B 1 .4 B 2 .5 B 3 .6 C 1 .7 C 2 .8 C 3 .9

What I want to create is data3: For each marker, I want all pairs of gene1 and gene2 excluding the case when gene1=gene2. I would also be happy with just the unique pairs if that is easier to do.

gene1 mark1 alle1 gene2 mark2 alle2 A 1 .1 B 1 .4 B 1 .4 A 1 .1

A 1 .1 C 1 .7 C 1 .7 A 1 .1

B 1 .4 C 1 .7 C 1 .7 B 1 .4

A 2 .2 B 2 .5 B 2 .5 A 2 .2

A 2 .2 C 2 .8 C 2 .8 A 2 .2

B 2 .5 C 2 .8 C 2 .8 B 2 .5

A 3 .3 B 3 .6 B 3 .6 A 3 .3

A 3 .3 C 3 .9 C 3 .9 A 3 .3

B 3 .6 C 3 .9 C 3 .9 B 3 .6


Back to: Top of message | Previous page | Main SAS-L page