Date: Sun, 14 Mar 1999 10:38:29 -0600
Reply-To: "Dwight M. Buffum" <dmbuffum@MN.MEDIAONE.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Dwight M. Buffum" <dmbuffum@MN.MEDIAONE.NET>
Organization: MindTools, Inc.
Subject: Re: SQL: One more related question: SQL question Cartesian
Product Question ?
Content-Type: multipart/mixed;
[multipart/alternative]
I think this fulfills your request.
data data1;
input gene1 $ marke1 alle1;
datalines;
A 1 .1
A 2 .2
A 3 .3
B 1 .4
B 2 .5
B 3 .6
C 1 .7
C 2 .8
C 3 .9
;
run;
data data2;
input gene2 $ marke2 alle2;
datalines;
A 1 .1
A 2 .2
A 3 .3
B 1 .4
B 2 .5
B 3 .6
C 1 .7
C 2 .8
C 3 .9
;
run;
proc sql;
create table data3 as
select *
from data1 d1,
data2 d2
where d1.marke1=d2.marke2
and d1.gene1<d2.gene2
order by d1.marke1,
d1.gene1,
d2.gene2
;
quit;
"Seltzer, Jon D." wrote:
> First my thanks to everyone that responded to my original question, you are
> all helping me get out of a monster programming jam.
>
> All of the solutions were orders of magnitude more efficient than what I was
> trying to do. Hopefully I am on my way to writing a program that takes
> 1/2hour to run Vs 40 hours to run.
>
> Below is my original question.
> In working on this problem a bit more, I suddenly realized
> in reality I want to get only one record from a representative pair from the
> output at the bottom
> of this email. I thought all pairs would be okay but now I realize that is
> not the case.
>
> Once again any help would be greatly appreciated.
>
> So now the final data set I would like to obtain is:
>
> gene1 mark1 alle1 gene2 mark2 alle2
> A 1 .1 B 1 .4
>
> A 1 .1 C 1 .7
>
> B 1 .4 C 1 .7
>
> A 2 .2 B 2 .5
>
> A 2 .2 C 2 .8
>
> B 2 .5 C 2 .8
>
> A 3 .3 B 3 .6
>
> A 3 .3 C 3 .9
>
> B 3 .6 C 3 .9
>
> If you compare this to the past final data set below you will notice that
> the second record has been deleted in each pair of records. I really have
> no clue how to do
> this without lots of data manipulation. The actual file I am working on is
> 2-4 million records,
> which is why I am looking for more efficient code.
>
> The original problem.
>
> Suppose data1 is:
>
> gene1 marke1 alle1
> A 1 .1
> A 2 .2
> A 3 .3
> B 1 .4
> B 2 .5
> B 3 .6
> C 1 .7
> C 2 .8
> C 3 .9
>
> Data2 is the same as data1 but with different variable names.
>
> gene2 marke2 alle2
> A 1 .1
> A 2 .2
> A 3 .3
> B 1 .4
> B 2 .5
> B 3 .6
> C 1 .7
> C 2 .8
> C 3 .9
>
> What I want to create is data3: For each marker, I want all pairs of gene1
> and gene2 excluding the case when gene1=gene2. I would also be happy with
> just the unique pairs if that is easier to do.
>
> gene1 mark1 alle1 gene2 mark2 alle2
> A 1 .1 B 1 .4
> B 1 .4 A 1 .1
>
> A 1 .1 C 1 .7
> C 1 .7 A 1 .1
>
> B 1 .4 C 1 .7
> C 1 .7 B 1 .4
>
> A 2 .2 B 2 .5
> B 2 .5 A 2 .2
>
> A 2 .2 C 2 .8
> C 2 .8 A 2 .2
>
> B 2 .5 C 2 .8
> C 2 .8 B 2 .5
>
> A 3 .3 B 3 .6
> B 3 .6 A 3 .3
>
> A 3 .3 C 3 .9
> C 3 .9 A 3 .3
>
> B 3 .6 C 3 .9
> C 3 .9 B 3 .6
[text/html]
[text/x-vcard]
begin:vcard
n:Buffum;Dwight
x-mozilla-html:TRUE
org:MindTools, Inc.;A SAS Institute Quality Partner
adr:;;2285 Stewart Av.., #1327;St. Paul;Minnesota;55116-3154;
version:2.1
email;internet:dmbuffum@mn.mediaone.net
title:President
fn:Dwight M. Buffum
end:vcard
|