Date: Thu, 31 Oct 2002 10:44:28 -0500 Sigurd Hermansen "SAS(r) Discussion" Sigurd Hermansen Re: matching To: Jian Mao text/plain; charset="iso-8859-1"

This question resembles the one posted last Friday. Before attempting to answer your question, let me improve on that earlier solution. Substituting a SUM() function for addition operators produces a non-missing value when scoring criteria contain missing values as well as non-missing terms. You may want to start with this SQL solution:

>>>>>> From SAS-L 10/25/2002 <<<<<<<<<< proc sql; create table CCpairs as select t1.*,t2.year as CNTLyear,t2.age as CNTLage,t2.LabValue as CNTLLabValue from Cases as t1 left join Controls as t2 on t1.gender=t2.gender group by t1.No,t1.gender having abs(sum(5*(year-CNTLyear),age-CNTLage,3*(LabValue-CNTLLabValue))) =min(abs(sum(5*(year-CNTLyear),age-CNTLage,3*(LabValue-CNTLLabValue)))) ; quit;

The ON condition requires a match on Gender. The weighting of deviations in Year (by 5X) and LabValue (by 3X) adjust for different measurement scales. You may also substitute squared deviations for absolute deviations and devise much more complex weighting schemes.

In many case matching problems, this procedure, though greedy, does an adequate job of assigning controls. You may need to test for unmatched Cases and for multiple matches to cases where weighted deviations might sum to the same value. >>>>>>>

Because you have your data in one dataset, you will need to write a reflexive join t to match up group 1 to group 2:

... from (select * from subjects where group=1) as t1 right join (select * from subjects where group=2) as t2 ....

The choice of a join condition makes a solution for your problem difficult. An attempt to match exactly one group 2 record to one group 1 record fails, even in your small sample of data. Try this program to see why:

data subjects; input Obs ID x y z GROUP ; cards; 1 1 65 1.00 1.00 1.00 2 2 45 3.00 3.00 1.00 3 3 84 1.00 1.00 1.00 4 4 89 4.00 5.00 1.00 5 5 93 1.00 1.00 1.00 6 6 83 1.00 1.00 1.00 7 7 82 1.00 1.00 1.00 8 8 57 1.00 1.00 1.00 9 10 93 3.00 1.00 1.00 10 11 63 2.00 1.00 1.00 11 12 67 3.00 1.00 1.00 12 13 61 1.00 1.00 1.00 13 14 73 1.00 1.00 1.00 14 15 68 1.00 1.00 1.00 15 16 78 4.00 1.00 1.00 16 17 45 1.00 1.00 1.00 17 18 59 1.00 1.00 1.00 18 19 81 1.00 1.00 1.00 19 20 78 1.00 1.00 1.00 20 21 50 3.00 1.00 1.00 21 22 74 2.00 1.00 1.00 22 23 80 1.00 1.00 1.00 23 26 58 4.00 4.00 1.00 24 27 119 4.00 5.00 1.00 25 29 35 1.00 . 1.00 26 31 69 1.00 1.00 1.00 27 32 77 1.00 1.00 1.00 28 33 73 1.00 1.00 1.00 29 34 68 1.00 1.00 1.00 30 35 70 4.00 1.00 1.00 31 36 43 3.00 1.00 1.00 32 37 69 1.00 1.00 1.00 33 41 50 4.00 4.00 1.00 34 42 54 1.00 1.00 1.00 35 43 46 3.00 3.00 1.00 36 45 62 1.00 2.00 1.00 37 46 70 1.00 4.00 1.00 38 47 77 2.00 3.00 1.00 39 48 69 2.00 2.00 1.00 40 49 53 1.00 1.00 1.00 41 50 50 3.00 1.00 1.00 42 51 101 2.00 1.00 1.00 43 52 66 1.00 1.00 1.00 44 53 48 1.00 1.00 1.00 45 56 50 2.00 1.00 1.00 46 57 91 1.00 4.00 1.00 47 58 66 2.00 2.00 1.00 48 60 55 5.00 5.00 1.00 49 61 49 4.00 3.00 1.00 50 63 58 4.00 2.00 1.00 51 64 60 3.00 3.00 1.00 52 65 82 3.00 3.00 1.00 53 66 101 3.00 3.00 1.00 54 67 32 3.00 3.00 1.00 55 68 107 1.00 1.00 1.00 56 69 68 3.00 3.00 1.00 57 71 94 1.00 1.00 1.00 58 72 71 1.00 1.00 1.00 59 73 37 1.00 1.00 1.00 60 75 61 3.00 1.00 1.00 61 77 91 1.00 4.00 1.00 62 78 70 1.00 1.00 1.00 63 82 76 3.00 2.00 1.00 64 83 50 2.00 2.00 1.00 65 84 59 1.00 3.00 1.00 66 85 55 1.00 1.00 1.00 67 86 90 1.00 1.00 1.00 68 87 35 2.00 1.00 1.00 69 89 65 4.00 1.00 1.00 70 90 79 1.00 1.00 1.00 71 93 69 1.00 1.00 1.00 72 96 44 1.00 1.00 1.00 73 97 52 1.00 1.00 1.00 74 98 82 1.00 1.00 1.00 75 99 52 1.00 1.00 1.00 76 100 67 2.00 1.00 1.00 77 102 92 1.00 1.00 1.00 78 103 46 3.00 3.00 1.00 79 104 74 1.00 1.00 1.00 80 105 71 2.00 1.00 1.00 81 108 59 3.00 3.00 1.00 82 110 73 1.00 1.00 1.00 83 111 40 1.00 1.00 1.00 84 112 28 2.00 1.00 1.00 85 113 54 5.00 2.00 1.00 86 114 82 1.00 1.00 1.00 87 115 52 1.00 1.00 1.00 88 117 64 1.00 1.00 1.00 89 121 91 1.00 1.00 1.00 90 122 63 1.00 1.00 1.00 91 123 58 2.00 1.00 1.00 92 124 80 1.00 1.00 1.00 93 125 50 1.00 1.00 1.00 94 127 87 3.00 4.00 1.00 95 129 53 1.00 1.00 1.00 96 130 47 1.00 1.00 1.00 97 132 45 1.00 1.00 1.00 98 133 65 . . 1.00 99 134 38 1.00 1.00 1.00 100 136 35 3.00 3.00 1.00 101 137 53 1.00 1.00 1.00 102 138 48 1.00 1.00 1.00 103 139 67 3.00 4.00 1.00 104 140 51 1.00 1.00 1.00 105 141 63 1.00 1.00 1.00 106 40 65 1.00 1.00 2.00 107 92 78 1.00 1.00 2.00 108 107 50 1.00 2.00 2.00 109 120 39 1.00 1.00 2.00 110 201 36 1.00 1.00 2.00 111 202 40 1.00 1.00 2.00 112 203 44 1.00 1.00 2.00 113 204 36 1.00 1.00 2.00 114 205 56 1.00 1.00 2.00 115 206 46 5.00 1.00 2.00 116 207 48 1.00 1.00 2.00 117 208 37 1.00 1.00 2.00 118 209 30 1.00 1.00 2.00 119 210 39 1.00 1.00 2.00 120 211 59 3.00 3.00 2.00 121 213 36 1.00 1.00 2.00 122 214 28 1.00 1.00 2.00 123 215 40 1.00 1.00 2.00 124 216 45 2.00 1.00 2.00 125 217 37 1.00 1.00 2.00 126 218 45 1.00 1.00 2.00 127 219 38 2.00 1.00 2.00 128 220 56 1.00 1.00 2.00 129 221 55 1.00 1.00 2.00 130 222 49 3.00 3.00 2.00 131 223 . 1.00 1.00 2.00 132 224 47 3.00 1.00 2.00 133 225 33 1.00 1.00 2.00 134 226 38 3.00 1.00 2.00 135 227 38 2.00 1.00 2.00 136 228 24 1.00 1.00 2.00 137 229 41 3.00 1.00 2.00 138 231 40 1.00 1.00 2.00 139 233 42 1.00 1.00 2.00 140 234 45 1.00 1.00 2.00 141 235 47 1.00 1.00 2.00 142 236 59 1.00 1.00 2.00 143 237 66 1.00 1.00 2.00 144 238 54 3.00 1.00 2.00 145 240 60 . 5.00 2.00 146 241 35 2.00 1.00 2.00 147 242 52 1.00 1.00 2.00 148 243 35 1.00 1.00 2.00 149 244 56 1.00 1.00 2.00 150 245 30 1.00 1.00 2.00 151 246 45 1.00 1.00 2.00 152 247 52 3.00 1.00 2.00 153 248 39 1.00 1.00 2.00 154 324 31 2.00 1.00 2.00 ; run; proc sql; create table reflexExactMatch as select t1.*,t2.ID as ID2,t2.x as x2,t2.y as y2,t2.z as z2,t2.group as group2 from (select * from subjects where group=1) as t1 right join (select * from subjects where group=2) as t2 on t1.x=t2.x and t1.y=t2.y and t1.z=t2.z order by ID2 ; quit; proc sql; create table reflexFuzzyMatch as select t1.*,t2.ID as ID2,t2.x as x2,t2.y as y2,t2.z as z2,t2.group as group2 from (select * from subjects where group=1) as t1 right join (select * from subjects where group=2) as t2 on &true group by ID2 /* Somewhat arbitrary assignments of weights! */ having abs(sum(1*(x-x2),1*(y-y2),1*(z-z2)))=min(abs(sum(1*(x-x2),1*(y-y2),1*(z-z2)) )) order by ID2 ; quit;

proc sql; create table conflictsID as select * from reflexFuzzyMatch group by ID having count(*)>1 ; create table conflictsID2 as select * from reflexFuzzyMatch group by ID2 having count(*)>1 ; quit;

The table reflexExactMatch will have both group 2 rows that fail to match to any group 1 row on x,y,z, and, as well, group 2 rows that match more than one group 1 row. All group 2 rows match at least one group 1 row in the sample matched in reflexFuzzyMatch, but many match more than one group 1 row. I'd say that these results look typical of problems in matching records outside the artificial world of a relational database management system. The general problem of matching on variables does not have a complete and closed solution algorithm.

Sig

-----Original Message----- From: Jian Mao [mailto:maojianj@HOTMAIL.COM] Sent: Wednesday, October 30, 2002 3:34 PM To: SAS-L@LISTSERV.UGA.EDU Subject: matching

Hi, I have 2 groups, 1st group has 105 subjects, and 2nd group has 49 subjects. I want to match subjects from these 2 groups based on variable X, Y and Z (choosing 49 subjects from group one to match group 2). Is there a good way to do it? Thanks, Jian <snip>

Back to: Top of message | Previous page | Main SAS-L page