| Date: | Thu, 31 Oct 2002 10:44:28 -0500 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: matching |
|
| Content-Type: | text/plain; charset="iso-8859-1" |
|---|
This question resembles the one posted last Friday. Before attempting to
answer your question, let me improve on that earlier solution. Substituting
a SUM() function for addition operators produces a non-missing value when
scoring criteria contain missing values as well as non-missing terms. You
may want to start with this SQL solution:
>>>>>> From SAS-L 10/25/2002 <<<<<<<<<<
proc sql;
create table CCpairs as
select t1.*,t2.year as CNTLyear,t2.age as CNTLage,t2.LabValue as
CNTLLabValue
from Cases as t1 left join Controls as t2
on t1.gender=t2.gender
group by t1.No,t1.gender
having abs(sum(5*(year-CNTLyear),age-CNTLage,3*(LabValue-CNTLLabValue)))
=min(abs(sum(5*(year-CNTLyear),age-CNTLage,3*(LabValue-CNTLLabValue))))
;
quit;
The ON condition requires a match on Gender. The weighting of deviations in
Year (by 5X) and LabValue (by 3X) adjust for different measurement scales.
You may also substitute squared deviations for absolute deviations and
devise much more complex weighting schemes.
In many case matching problems, this procedure, though greedy, does an
adequate job of assigning controls. You may need to test for unmatched Cases
and for multiple matches to cases where weighted deviations might sum to the
same value.
>>>>>>>
Because you have your data in one dataset, you will need to write a
reflexive join t to match up group 1 to group 2:
... from (select * from subjects where group=1) as t1
right join
(select * from subjects where group=2) as t2
....
The choice of a join condition makes a solution for your problem difficult.
An attempt to match exactly one group 2 record to one group 1 record fails,
even in your small sample of data. Try this program to see why:
data subjects;
input Obs ID x y z GROUP ;
cards;
1 1 65 1.00 1.00 1.00
2 2 45 3.00 3.00 1.00
3 3 84 1.00 1.00 1.00
4 4 89 4.00 5.00 1.00
5 5 93 1.00 1.00 1.00
6 6 83 1.00 1.00 1.00
7 7 82 1.00 1.00 1.00
8 8 57 1.00 1.00 1.00
9 10 93 3.00 1.00 1.00
10 11 63 2.00 1.00 1.00
11 12 67 3.00 1.00 1.00
12 13 61 1.00 1.00 1.00
13 14 73 1.00 1.00 1.00
14 15 68 1.00 1.00 1.00
15 16 78 4.00 1.00 1.00
16 17 45 1.00 1.00 1.00
17 18 59 1.00 1.00 1.00
18 19 81 1.00 1.00 1.00
19 20 78 1.00 1.00 1.00
20 21 50 3.00 1.00 1.00
21 22 74 2.00 1.00 1.00
22 23 80 1.00 1.00 1.00
23 26 58 4.00 4.00 1.00
24 27 119 4.00 5.00 1.00
25 29 35 1.00 . 1.00
26 31 69 1.00 1.00 1.00
27 32 77 1.00 1.00 1.00
28 33 73 1.00 1.00 1.00
29 34 68 1.00 1.00 1.00
30 35 70 4.00 1.00 1.00
31 36 43 3.00 1.00 1.00
32 37 69 1.00 1.00 1.00
33 41 50 4.00 4.00 1.00
34 42 54 1.00 1.00 1.00
35 43 46 3.00 3.00 1.00
36 45 62 1.00 2.00 1.00
37 46 70 1.00 4.00 1.00
38 47 77 2.00 3.00 1.00
39 48 69 2.00 2.00 1.00
40 49 53 1.00 1.00 1.00
41 50 50 3.00 1.00 1.00
42 51 101 2.00 1.00 1.00
43 52 66 1.00 1.00 1.00
44 53 48 1.00 1.00 1.00
45 56 50 2.00 1.00 1.00
46 57 91 1.00 4.00 1.00
47 58 66 2.00 2.00 1.00
48 60 55 5.00 5.00 1.00
49 61 49 4.00 3.00 1.00
50 63 58 4.00 2.00 1.00
51 64 60 3.00 3.00 1.00
52 65 82 3.00 3.00 1.00
53 66 101 3.00 3.00 1.00
54 67 32 3.00 3.00 1.00
55 68 107 1.00 1.00 1.00
56 69 68 3.00 3.00 1.00
57 71 94 1.00 1.00 1.00
58 72 71 1.00 1.00 1.00
59 73 37 1.00 1.00 1.00
60 75 61 3.00 1.00 1.00
61 77 91 1.00 4.00 1.00
62 78 70 1.00 1.00 1.00
63 82 76 3.00 2.00 1.00
64 83 50 2.00 2.00 1.00
65 84 59 1.00 3.00 1.00
66 85 55 1.00 1.00 1.00
67 86 90 1.00 1.00 1.00
68 87 35 2.00 1.00 1.00
69 89 65 4.00 1.00 1.00
70 90 79 1.00 1.00 1.00
71 93 69 1.00 1.00 1.00
72 96 44 1.00 1.00 1.00
73 97 52 1.00 1.00 1.00
74 98 82 1.00 1.00 1.00
75 99 52 1.00 1.00 1.00
76 100 67 2.00 1.00 1.00
77 102 92 1.00 1.00 1.00
78 103 46 3.00 3.00 1.00
79 104 74 1.00 1.00 1.00
80 105 71 2.00 1.00 1.00
81 108 59 3.00 3.00 1.00
82 110 73 1.00 1.00 1.00
83 111 40 1.00 1.00 1.00
84 112 28 2.00 1.00 1.00
85 113 54 5.00 2.00 1.00
86 114 82 1.00 1.00 1.00
87 115 52 1.00 1.00 1.00
88 117 64 1.00 1.00 1.00
89 121 91 1.00 1.00 1.00
90 122 63 1.00 1.00 1.00
91 123 58 2.00 1.00 1.00
92 124 80 1.00 1.00 1.00
93 125 50 1.00 1.00 1.00
94 127 87 3.00 4.00 1.00
95 129 53 1.00 1.00 1.00
96 130 47 1.00 1.00 1.00
97 132 45 1.00 1.00 1.00
98 133 65 . . 1.00
99 134 38 1.00 1.00 1.00
100 136 35 3.00 3.00 1.00
101 137 53 1.00 1.00 1.00
102 138 48 1.00 1.00 1.00
103 139 67 3.00 4.00 1.00
104 140 51 1.00 1.00 1.00
105 141 63 1.00 1.00 1.00
106 40 65 1.00 1.00 2.00
107 92 78 1.00 1.00 2.00
108 107 50 1.00 2.00 2.00
109 120 39 1.00 1.00 2.00
110 201 36 1.00 1.00 2.00
111 202 40 1.00 1.00 2.00
112 203 44 1.00 1.00 2.00
113 204 36 1.00 1.00 2.00
114 205 56 1.00 1.00 2.00
115 206 46 5.00 1.00 2.00
116 207 48 1.00 1.00 2.00
117 208 37 1.00 1.00 2.00
118 209 30 1.00 1.00 2.00
119 210 39 1.00 1.00 2.00
120 211 59 3.00 3.00 2.00
121 213 36 1.00 1.00 2.00
122 214 28 1.00 1.00 2.00
123 215 40 1.00 1.00 2.00
124 216 45 2.00 1.00 2.00
125 217 37 1.00 1.00 2.00
126 218 45 1.00 1.00 2.00
127 219 38 2.00 1.00 2.00
128 220 56 1.00 1.00 2.00
129 221 55 1.00 1.00 2.00
130 222 49 3.00 3.00 2.00
131 223 . 1.00 1.00 2.00
132 224 47 3.00 1.00 2.00
133 225 33 1.00 1.00 2.00
134 226 38 3.00 1.00 2.00
135 227 38 2.00 1.00 2.00
136 228 24 1.00 1.00 2.00
137 229 41 3.00 1.00 2.00
138 231 40 1.00 1.00 2.00
139 233 42 1.00 1.00 2.00
140 234 45 1.00 1.00 2.00
141 235 47 1.00 1.00 2.00
142 236 59 1.00 1.00 2.00
143 237 66 1.00 1.00 2.00
144 238 54 3.00 1.00 2.00
145 240 60 . 5.00 2.00
146 241 35 2.00 1.00 2.00
147 242 52 1.00 1.00 2.00
148 243 35 1.00 1.00 2.00
149 244 56 1.00 1.00 2.00
150 245 30 1.00 1.00 2.00
151 246 45 1.00 1.00 2.00
152 247 52 3.00 1.00 2.00
153 248 39 1.00 1.00 2.00
154 324 31 2.00 1.00 2.00
;
run;
proc sql;
create table reflexExactMatch as
select t1.*,t2.ID as ID2,t2.x as x2,t2.y as y2,t2.z as z2,t2.group as
group2
from (select * from subjects where group=1) as t1
right join
(select * from subjects where group=2) as t2
on t1.x=t2.x and t1.y=t2.y and t1.z=t2.z
order by ID2
;
quit;
proc sql;
create table reflexFuzzyMatch as
select t1.*,t2.ID as ID2,t2.x as x2,t2.y as y2,t2.z as z2,t2.group as
group2
from (select * from subjects where group=1) as t1
right join
(select * from subjects where group=2) as t2
on &true
group by ID2
/* Somewhat arbitrary assignments of weights! */
having
abs(sum(1*(x-x2),1*(y-y2),1*(z-z2)))=min(abs(sum(1*(x-x2),1*(y-y2),1*(z-z2))
))
order by ID2
;
quit;
proc sql;
create table conflictsID as
select * from reflexFuzzyMatch
group by ID
having count(*)>1
;
create table conflictsID2 as
select * from reflexFuzzyMatch
group by ID2
having count(*)>1
;
quit;
The table reflexExactMatch will have both group 2 rows that fail to match to
any group 1 row on x,y,z, and, as well, group 2 rows that match more than
one group 1 row. All group 2 rows match at least one group 1 row in the
sample matched in reflexFuzzyMatch, but many match more than one group 1
row. I'd say that these results look typical of problems in matching records
outside the artificial world of a relational database management system. The
general problem of matching on variables does not have a complete and closed
solution algorithm.
Sig
-----Original Message-----
From: Jian Mao [mailto:maojianj@HOTMAIL.COM]
Sent: Wednesday, October 30, 2002 3:34 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: matching
Hi,
I have 2 groups, 1st group has 105 subjects, and 2nd group has 49 subjects.
I want to match subjects from these 2 groups based on variable X, Y and Z
(choosing 49 subjects from group one to match group 2).
Is there a good way to do it?
Thanks,
Jian
<snip>
|