Date: Mon, 19 Feb 2001 21:19:53 -0800
Reply-To: John Baj <jbaj@NIU.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: John Baj <jbaj@NIU.EDU>
Subject: Matching observations in two datasets
Content-Type: text/plain; charset=us-ascii
I have two datasets that contains the information on the job spells of
individuals: one dataset for welfare recipients(about 300,000 job
spells)and another for those not on welfare(about 4.5 million job
spells). The two databases have the same variables. The three variables
most relevant to my problem are: Employer Id (empid), Quarter
employment began (startdt) and earnings in 2nd quarter after the start
What I need to do is to build a comparision group for the welfare
population from the non welfare population (matched pairs). For this
comparison group, I am trying to match each job spell of welfare
recipients to a job spell of a non recipient who began working for same
employer at the same time and for approximately the same wage (using the
earnings variable as a proxy) as the recipient. The matches are to be
constrained so that the difference in earnings between the matched pairs
cannot be more than $150.
If there was just one welfare client per employer and start date, I
could simply merge on the basis of empid and startdt and select the non
welfare recipient with the earnings closest to the welfare recipient.
However, many employers hire more than one welfare recipient during the
same quarter resulting in multiple observations on the by variables in
each dataset. In addition, it is possible that some welfare recipients
cannot be matched so I have to account for that as well.
I's be grateful for any hints on how to tackle this problem. My gut
says that the solution may lie in SQL but I simply do not know SQL that
well to attempt it from scratch.
Thanks in advance.