Date: Thu, 25 Jan 2007 19:01:00 -0500
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: Coding HAART in a Sample of HIV Patients: Very Difficult
Sorting/Sequencing Problem
Paul,
Since lines 1 thru 7 of your data have complete information, while rows 8
and 9 don't, I don't understand your criteria for linking row 7 with rows'
8 and 9.
Art
----------
On Thu, 25 Jan 2007 11:03:08 -0500, Paul Miller <pmiller@OHTN.ON.CA> wrote:
>Hello Everyone,
>
>
>
>I've been struggling for some time now with what appears to be a very
>difficult sorting/sequencing task. In fact, I find that even explaining
>the problem so people can understand it is sometimes difficult. I've
>pasted some sample syntax below. The syntax is designed to code
>multi-drug Highly Active Antiretroviral Therapy (HAART) regimens in a
>sample of HIV patients. The CHANGES dataset sequences some individual
>antiretroviral medications for a single patient. DRUG_CLASS in the
>dataset indicates what type of antiretroviral the patient was taking.
>DATE indicates the date on which the patient started or stopped taking
>the drug and is missing where this is unknown. MIN_DATE indicates the
>earliest possible value of DATE and MAX_DATE indicates that latest
>possible value of DATE. MID_DATE is the midpoint between MIN_DATE and
>MAX_DATE. Finally, CHANGE indicates whether the patient started or
>stopped taking a drug (1 = start, -1 = stop).
>
>
>
>The syntax defines HAART as:
>
>
>
>(NRTI >= 3 AND NNRTI=0 AND PI=0) OR
>
>(NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
>
>(NRTI = 1 AND NNRTI >= 1 AND PI >= 1).
>
>
>
>On the whole, the syntax works pretty well but it doesn't always
>sequence the drugs the way I want it to. In this case, it correctly
>determines the first 2 regimens using rows 1 - 6 of the CHANGES dataset.
>Unfortunately though, it doesn't sequence the drugs the way I would like
>it to starting with row 7.
>
>
>
>By the time the program reaches row 7 in the CHANGES dataset, the
>patient is taking an NRTI and an NNRTI. The last three drugs are all
>starts but their order is indeterminate. Based on the values for DATE,
>MIN_DATE, and MAX_DATE, it is possible that any one of these 3 drugs
>could have come next in the sequence.
>
>
>
>My current default is to assume that the next drug of the 3 is the one
>with the earliest MID_DATE value and my data are sorted accordingly. In
>this case though, this default is likely to result in an incorrect
>sequencing. As I said earlier, the patient is taking an NRTI and an
>NNRTI by the time we get to row 7 of the CHANGES dataset. Thus, I would
>be inclined to sequence one of the PI in rows 8 and 9 next and not the
>NNRTI in row 7 because the addition of a PI to an NRTI and an NNRTI will
>create a new HAART regimen whereas the addition of another NNRTI will
>not. I would specifically pick the PI in row 8 because it has an earlier
>MID value than the PI in Row 9.
>
>
>
>Is there any way to get SAS to recognize that the last 3 drugs are
>indeterminate and then to sequence the drugs based on the criteria that
>I've just described?
>
>
>
>Thanks,
>
>
>
>Paul
>
>Paul J. Miller, Ph.D.
>Research Scientist and Statistician
>Ontario HIV Treatment Network
>1300 Yonge St., Suite 308
>Toronto, Ontario M4T 1X3
>Phone: (416) 642-6486 ext 232
>Fax: (416) 640-4245
>
>
>
>DATA CHANGES;
>
> INPUT SITE_ID DRUG_CLASS $ DATE :MMDDYY. MIN_DATE :MMDDYY.
>MAX_DATE :MMDDYY. MID_DATE :MMDDYY. CHANGE;
>
> FORMAT DATE MIN_DATE MAX_DATE MID_DATE MMDDYY8.;
>
> DATALINES;
>
> 1 PI 4/21/1998 4/21/1998
>4/21/1998 4/21/1998 1
>
> 1 NNRTI 4/21/1998 4/21/1998
>4/21/1998 4/21/1998 1
>
> 1 NRTI 4/21/1998 4/21/1998
>4/21/1998 4/21/1998 1
>
> 1 NNRTI 4/21/1998 4/21/1998
>4/21/1998 4/21/1998 1
>
> 1 PI 12/7/1998 12/7/1998
>12/7/1998 12/7/1998 -1
>
> 1 NNRTI 12/7/1998 12/7/1998
>12/7/1998 12/7/1998 -1
>
> 1 NNRTI 1/29/1999 1/29/1999
>1/29/1999 1/29/1999 1
>
> 1 PI . 1/1/1999
>6/5/1999 3/19/1999 1
>
> 1 PI . 1/1/1999
>8/5/1999 4/19/1999 1
>
>;
>
>RUN;
>
>
>
>/*ROLL UP TO 1 OBSERVATION PER ID PER DAY AND COMPUTE HAART*/
>
>
>
>DATA CUMULATIVE (DROP=DRUG_CLASS CHANGE STOP_DATE
>
> RENAME=(DATE=START_DATE
>MIN_DATE=MIN_START MAX_DATE=MAX_START))
>
> STOP_DATES (KEEP=SITE_ID REGIMEN STOP_DATE MIN_DATE
>MAX_DATE
>
> RENAME=(MIN_DATE=MIN_STOP
>MAX_DATE=MAX_STOP));
>
> RETAIN SITE_ID REGIMEN;
>
> SET CHANGES;
>
> BY SITE_ID MID_DATE;
>
>
>
> IF FIRST.SITE_ID THEN DO;
>
> REGIMEN = 0;
>
> NRTI = 0;
>
> NNRTI = 0;
>
> PI = 0;
>
> END;
>
>
>
> IF DRUG_CLASS = 'NRTI' THEN NRTI + CHANGE;
>
> ELSE IF DRUG_CLASS = 'NNRTI' THEN NNRTI + CHANGE;
>
> ELSE IF DRUG_CLASS = 'PI' THEN PI + CHANGE;
>
>
>
> IF LAST.MID_DATE THEN DO;
>
> STOP_DATE = DATE;
>
>
>
> IF REGIMEN THEN OUTPUT STOP_DATES;
>
> REGIMEN + 1;
>
>
>
> ALLDRUGS = NNRTI + NRTI + PI;
>
> HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
>
> (NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
>
> (NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
>
> OUTPUT CUMULATIVE;
>
> END;
>
>
>
> FORMAT STOP_DATE MMDDYY10.;
>
>RUN;
>
>
>
>DATA REGIMENS (DROP=REGIMEN MID_DATE);
>
> RETAIN SITE_ID START_DATE STOP_DATE MIN_START MAX_START
>MIN_STOP MAX_STOP
>
> DURATION MIN_DURATION MAX_DURATION;
>
> MERGE CUMULATIVE STOP_DATES;
>
> BY SITE_ID REGIMEN;
>
>
>
> IF START_DATE NE . AND STOP_DATE NE . THEN DO;
>
> DURATION = STOP_DATE - START_DATE;
>
> MIN_DURATION = DURATION;
>
> MAX_DURATION = DURATION;
>
> END;
>
>
>
> ELSE IF START_DATE NE . AND STOP_DATE = . THEN DO;
>
> IF MIN_STOP < START_DATE AND MIN_STOP NE . THEN DO;
>
> MIN_STOP = START_DATE;
>
> END;
>
> DURATION = .;
>
> MIN_DURATION = MIN_STOP - START_DATE;
>
> MAX_DURATION = MAX_STOP - START_DATE;
>
> END;
>
>
>
> ELSE IF START_DATE = . AND STOP_DATE NE . THEN DO;
>
> IF MAX_START > STOP_DATE AND STOP_DATE NE . THEN DO;
>
> MAX_START = STOP_DATE;
>
> END;
>
> DURATION = .;
>
> MIN_DURATION = STOP_DATE - MAX_START;
>
> MAX_DURATION = STOP_DATE - MIN_START;
>
> END;
>
>
>
> ELSE IF START_DATE = . AND STOP_DATE = . THEN DO;
>
> DURATION = .;
>
> IF MIN_STOP = . AND MAX_STOP = . THEN DO;
>
> MIN_DURATION = .;
>
> MAX_DURATION = .;
>
> END;
>
> IF MAX_START > MAX_STOP AND MAX_STOP NE . THEN DO;
>
> MAX_START = MAX_STOP;
>
> END;
>
> IF MAX_START > MIN_STOP AND MIN_STOP NE . THEN DO;
>
> MIN_DURATION = 0;
>
> MAX_DURATION = MAX_STOP - MIN_START;
>
> END;
>
> IF MAX_START <= MIN_STOP THEN DO;
>
> MIN_DURATION = MIN_STOP - MAX_START;
>
> MAX_DURATION = MAX_STOP - MIN_START;
>
> END;
>
> END;
>
>
>
> IF ALLDRUGS;
>
>RUN;
>
>
>
>PROC DELETE DATA=CHANGES CUMULATIVE STOP_DATES;
>
>RUN;
>
>
|