Date: Mon, 11 Jul 2005 09:57:24 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: A special case of missing data replacement
Hi Yun -
Very interesting Monday morning problem - the only solution I could see
right off was to take a Cartesian product of the data three times to make
sure all the columns were permuted, and then another Cartesian join to flag
which sets were covered other sets, ending with a merge to delete them.
In your example you retained two cases of "X Y Z S" - I'm not sure why you
kept two. (Because of ordering?) My solution only keeps the unique
supersets and eliminates all subsets.
I didn't try to optimize this solution, and it can be tightened up in
several places. If your data is at all large then the Cartesian joins can
easily get out of hand because they do not optimize.
hth
Paul Choate
DDS Data Extraction
(916) 654-2160
DATA T1;
INPUT @1 A $1. @3 B $1. @5 C $1. @7 D $1.;
CARDS;
X Y Z
Y Z S
. Y . .
X Y S
Y S
X Y Z
X S Z .
Y Z A
Y Z
Y S
S Z
A B C
B C D
E F
G H J
;
RUN;
data t2;
set t1;
rename a=w b=x c=y d=z;
run;
proc sql;
create TABLE t3 as
select *
from t1, t2
where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~='');
quit;
data t1;
set t3;
array one(4) $1. a--d;
array two(4) $1. w--z;
do i=1 to 4;
if (not missing(one(i)) and not missing(two(i)))
and one(i) ne two(i) then delete;
end;
do i=1 to 4;
if missing(one(i)) then one(i)=two(i);
end;
run;
proc sort data=t1(drop=w--z i) nodupkey;
by a--d;
run;
data t2;
set t1;
rename a=w b=x c=y d=z;
run;
proc sql;
create TABLE t3 as
select *
from t1, t2
where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~='');
quit;
data t1;
set t3;
array one(4) $1. a--d;
array two(4) $1. w--z;
do i=1 to 4;
if (not missing(one(i)) and not missing(two(i)))
and one(i) ne two(i) then delete;
end;
do i=1 to 4;
if missing(one(i)) then one(i)=two(i);
end;
run;
proc sort data=t1(drop=w--z i) nodupkey;
by a--d;
run;
data t2;
set t1;
rename a=w b=x c=y d=z;
run;
proc sql;
create TABLE t3 as
select *
from t1, t2
where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~='');
quit;
data t1;
set t3;
array one(4) $1. a--d;
array two(4) $1. w--z;
do i=1 to 4;
if (not missing(one(i)) and not missing(two(i)))
and one(i) ne two(i) then delete;
end;
do i=1 to 4;
if missing(one(i)) then one(i)=two(i);
end;
run;
proc sort data=t1(drop=w--z i) nodupkey;
by a--d;
run;
data t2;
set t1;
rename a=w b=x c=y d=z;
run;
proc sql;
create TABLE t3 as
select *
from t1, t2;
quit;
data t3(keep=a--d);
set t3;
if (a=w|a='')*(b=x|b='')*(c=y|c='')*(d=z|d='');
if sum(~missing(a),~missing(b),~missing(c),~missing(d)) ge
sum(~missing(w),~missing(x),~missing(y),~missing(z)) then delete;
run;
proc sort data=t3 nodupkey;
by a--d;
run;
data t1;
merge t1 (in=in1)
t3 (in=in3);
by a--d;
if not in3;
run;
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Yun Gao
Sent: Sunday, July 10, 2005 9:39 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: A special case of missing data replacement
Hi,
I would like to replace missing data in a way such that missing data will
be replaced by non-missing data in this column in other observations as
long as the values of other variables keep the same, only the observations
who are not able to be replaced further will be output to final dataset.
For example for a dataset like:
DATA T1;
INPUT @1 A $1. @3 B $1. @5 C $1. @7 D $1.;
CARDS;
X Y Z
Y Z S
. Y . .
X Y S
Y S
X Y Z
X S Z .
Y Z A
Y Z
Y S
S Z
A B C
B C D
E F
G H J
;
RUN;
The final dataset is supposed to be something like:
A B C D
-------
X Y Z S
X Y S S
X Y Z S
X S Z Z
Y Z A
A B C D
E F
G H J
Any ideas are very much appreicated.
Thanks,
Yun Gao