```Date: Mon, 11 Jul 2005 09:57:24 -0700 Reply-To: "Choate, Paul@DDS" Sender: "SAS(r) Discussion" From: "Choate, Paul@DDS" Subject: Re: A special case of missing data replacement Comments: To: Yun Gao Hi Yun - Very interesting Monday morning problem - the only solution I could see right off was to take a Cartesian product of the data three times to make sure all the columns were permuted, and then another Cartesian join to flag which sets were covered other sets, ending with a merge to delete them. In your example you retained two cases of "X Y Z S" - I'm not sure why you kept two. (Because of ordering?) My solution only keeps the unique supersets and eliminates all subsets. I didn't try to optimize this solution, and it can be tightened up in several places. If your data is at all large then the Cartesian joins can easily get out of hand because they do not optimize. hth Paul Choate DDS Data Extraction (916) 654-2160 DATA T1; INPUT @1 A \$1. @3 B \$1. @5 C \$1. @7 D \$1.; CARDS; X Y Z Y Z S . Y . . X Y S Y S X Y Z X S Z . Y Z A Y Z Y S S Z A B C B C D E F G H J ; RUN; data t2; set t1; rename a=w b=x c=y d=z; run; proc sql; create TABLE t3 as select * from t1, t2 where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~=''); quit; data t1; set t3; array one(4) \$1. a--d; array two(4) \$1. w--z; do i=1 to 4; if (not missing(one(i)) and not missing(two(i))) and one(i) ne two(i) then delete; end; do i=1 to 4; if missing(one(i)) then one(i)=two(i); end; run; proc sort data=t1(drop=w--z i) nodupkey; by a--d; run; data t2; set t1; rename a=w b=x c=y d=z; run; proc sql; create TABLE t3 as select * from t1, t2 where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~=''); quit; data t1; set t3; array one(4) \$1. a--d; array two(4) \$1. w--z; do i=1 to 4; if (not missing(one(i)) and not missing(two(i))) and one(i) ne two(i) then delete; end; do i=1 to 4; if missing(one(i)) then one(i)=two(i); end; run; proc sort data=t1(drop=w--z i) nodupkey; by a--d; run; data t2; set t1; rename a=w b=x c=y d=z; run; proc sql; create TABLE t3 as select * from t1, t2 where (a=w & a~='') or (b=x & b~='') or (c=y & c~='') or (d=z & d~=''); quit; data t1; set t3; array one(4) \$1. a--d; array two(4) \$1. w--z; do i=1 to 4; if (not missing(one(i)) and not missing(two(i))) and one(i) ne two(i) then delete; end; do i=1 to 4; if missing(one(i)) then one(i)=two(i); end; run; proc sort data=t1(drop=w--z i) nodupkey; by a--d; run; data t2; set t1; rename a=w b=x c=y d=z; run; proc sql; create TABLE t3 as select * from t1, t2; quit; data t3(keep=a--d); set t3; if (a=w|a='')*(b=x|b='')*(c=y|c='')*(d=z|d=''); if sum(~missing(a),~missing(b),~missing(c),~missing(d)) ge sum(~missing(w),~missing(x),~missing(y),~missing(z)) then delete; run; proc sort data=t3 nodupkey; by a--d; run; data t1; merge t1 (in=in1) t3 (in=in3); by a--d; if not in3; run; -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Yun Gao Sent: Sunday, July 10, 2005 9:39 AM To: SAS-L@LISTSERV.UGA.EDU Subject: A special case of missing data replacement Hi, I would like to replace missing data in a way such that missing data will be replaced by non-missing data in this column in other observations as long as the values of other variables keep the same, only the observations who are not able to be replaced further will be output to final dataset. For example for a dataset like: DATA T1; INPUT @1 A \$1. @3 B \$1. @5 C \$1. @7 D \$1.; CARDS; X Y Z Y Z S . Y . . X Y S Y S X Y Z X S Z . Y Z A Y Z Y S S Z A B C B C D E F G H J ; RUN; The final dataset is supposed to be something like: A B C D ------- X Y Z S X Y S S X Y Z S X S Z Z Y Z A A B C D E F G H J Any ideas are very much appreicated. Thanks, Yun Gao ```

Back to: Top of message | Previous page | Main SAS-L page