Date: Thu, 15 Aug 1996 21:58:24 -0400
Reply-To: Ellen Hertz <eshertz@ACCESS.DIGEX.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Ellen Hertz <eshertz@ACCESS.DIGEX.NET>
Organization: Express Access Online Communications, USA
Subject: Re: SAS merge with duplicates in several by groups
Par Sparen wrote:
>
> Does anyone have a good solution to the following problem:
>
> I4m trying to write an _efficient_ program in SAS to match out eligible
> controls for diseased cases. To avoid the time consuming task to read the
> file with eligible controls for _every_ new case (this strategy works well
> with. say, up to 500 cases and some hundred thousands of eligible controls
> - but soon gets out of hand with 5,000 cases and over one million eligible
> controls) I want to sort out the eligible controls for each case by
> matching on the controls for each risk set among the cases. Since I need
> to assure that an eligible control is alive and free from the disease
> under study, the file with the cases is summed up and sorted by the
> matching criteria (say, year of birth, sex and hospital code), but also by
> the date of diagnosis for each case. To match the cases with the controls
> I merge the two files by year of birth, sex and hospital code, but of
> course end up in the situation that each of the two files has duplicates
> in the by groups.
> SAS handles this by selecting observations one by one from the case and
> control file for each by group, but when it reaches the last case in a by
> group, and still have several controls in the same by group, SAS puts all
> of the controls in that last group together with the last case. What else
> could SAS do, you may ask? This is the inbuilt logic of the program.
> Now, my question to you out there, is whether there is any way to
> control the matching of two files (MERGE in SAS) so that the program
> creates one observation for each combination of similarity in the
> matching criteria for each by group (i.e. multiplying the observations of
> the two files in each by group). I haven't found anything on this in the
> Reference or Usage manuals.
> I know that one way to solve the problem would be to steop around it by
> writing a SAS macro that sorts out one data set for each duplicate of
> cases by the matching criteria, so that only the control file contains
> duplicates on those criteria. This would certainly slow down the program
> though, since it means I have to read (or match) the control file several
> times just to sort out the eligible controls, and this is what I wanted to
> avoid.
> I appreciate all the good suggestions you can come up with to solve this
> problem. I'd rather not turn over to use Pascal or C, or any other basic
> programming language, where such a problem is easily solved. It takes a
> lot of work to make a program like that generally applicable for other users.
>
> Regards,
>
> --
> Par Sparen, PhD | email: Par.Sparen@epic.uu.se
> Dept. of Cancer Epidemiology | phone: +46-18-66 46 77
> University hospital | fax: +46-18-50 34 31
> S-751 85 UPPSALA, Sweden_________________________________________:)__
Have you considered setting the controls
and cases together in one data base and using
a procedure that permits a STRATA statement?
Then you could stratify on birthyear, sex
and hospital code and get estimates of the effect of any other
covariates. One possibility is conditional logistic regression
with PROC PHREG (ref SAS Technical Report P-229, page 465).
Even if you succeeded in merging every case with every control
that has the same values of the stratifying variables (if I understand
correctly), it is not clear how the resulting data set could
be used for analysis, since its observations would not
be independent.
in the model statement
|