LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2001, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 30 Apr 2001 10:54:08 +0000
Reply-To:     "Dr. Hans-Christian Waldmann" <waldmann@SAMSON.FIRE.UNI-BREMEN.DE>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dr. Hans-Christian Waldmann" <waldmann@SAMSON.FIRE.UNI-BREMEN.DE>
Organization: University of Bremen
Subject:      randomize-match-balance-algorithm

Dear List,

I would like to invite opinions pertaining to the following problem:

We are trying to evaluate some treatment using a standard pre-post design with multiple control-groups (4 groups total). Upon arrival patients are randomized into these groups. All's fine. But patients have been assigned different diagnoses (while being eligible for the same treatment) and we need to control for possible treatment*diagnoses interactions. A solution would be matching the patients after rando- mization with respect to diagnoses, without any repartionning of the forerly selected groups. I have written a SAS macro to this end that implements the following algorithm. What I would like to know: does this corrupt the idea of control by randomization or is this sort of two-stage-sampling a sound strategy ?

Given: A master data set has several variables, among these:

- ID (of patient - Match (Diagnose, numeric for convience

---

1. Take all of the N subjects from the master data set, and allot them into G groups of equal size k=N/G using the usual "sort-by- random-uniform(0)-and-take-first-k-Algorithm". Delete those assigned from the pool and repeat until all groups have been filled.

2. Now, for each group, count the occurence of each value of the matching variable. From the resulting array group*value_$ (cell=count of the value) select the minimum frequency into an auxilliary variable.

Reduce all groups by random case-deletion to have this least common count for the particular value of the matching variable. Do it for all values of the matching variable, and put the pieces together (append the set for each group).

Now we have G groups with exactly the same distribution of the matching variable within groups and equal sample size across groups and append these oens to give the final set.

3. Even if the master data set had been completely balanced with respect to the matching variable, randomization is almost shure to imbalance the values of the matching variable within the newly sampled groups. Since we deleted some patients from the g-1 groups having more patients with a particular value of the matching variable in the previous step, there will be some loss form the original master data set to the final one. The idea is to put this set of patients in another "input"-set and iterate the whole procedure. The result set of the second loop is appended to the first. If the loss does not exceed a specified proportion or if the variance of the matching variable is to poor to build interim datasets in step 2, quit: else iterate again.

---

I have run this with a master dataset of 624 Patients, to be split in g=4 groups of k=156 each in the first run. The matching variable had 4 distinct values.

The first result set came up with 544 out of 624 patients, with 136 patients in each group and the matching variable balanced. The procedures iterated with the 80 patients not in the result set, randomized, matched, and augmentend the final set with another 52 people (13 persons holding the same out of 4 values of the matching variable). So we got 596 out of 624, rnadomized with respect to anything unknown and matched with respect to the matching variable.

Lots of other test runs yield comparable results, and the final sets passed all tests (duplicated IDs' ? same sizes ?, same count for matching variables ? etc.)

So I know it works. But is it right (conceptually)?? Or, if it is, is it trivial (in the sense that it's unneccessary or could be done most easily)?

Any comments / hints / criticisms welcome !!

Yours

Hans

--------------------------------------------------------------------- PD Dr. Hans C Waldmann Methodology & Applied Statistics in Psychology & the Health Sciences

ZFRF / University of Bremen / Grazer Str 6 / 28359 Bremen / Germany waldmann@samson.fire.uni-bremen.de http://samson.fire.uni-bremen.de/waldmann

friend of: AIX PERL POSTGRES ADABAS SAS TEX ---------------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page