|
On Wed, 21 Nov 2007 06:29:09 +1000, Craig Hansen <c.hansen@UQ.EDU.AU> wrote:
>Hi Everyone
>
>I have a dataset of 60,000 pathology reports and I need to pull out
>patients that had a biopsy for their first visit and then an excision on
>the same skin lesion for their subsequent visit. For the remainder
>patients that were not pulled out of the dataset I need them to be in
>another dataset. In the example data below, take note how patient with
>patid=2 has two biopsies on their first visit (03MAR06) but only one of
>the biopsies had a subsequent excision (10MAR06).
>
>I would appreciate any ideas on how I can do this. Thanks in advance.
>
>Here is an example of the data
>
>patid reportid date type site
>1 1 02FEB06 BIOPSY ARM
>1 1 06FEB06 BIOPSY ARM
>2 2 03MAR06 BIOPSY LEG
>2 2 03MAR06 BIOPSY HEAD
>2 3 10MAR06 EXCISION LEG
>3 4 11MAR06 EXCISION ARM
>4 5 06APR06 BIOPSY HEAD
>4 6 10APR06 EXCISION HEAD
>
>Here is how I would like the two final datasets
>
>Dataset one (patients that did not have a biopsy and subsequent
>excision)
>
>Patid reportid date type site
>1 1 02FEB06 BIOPSY ARM
>1 1 06FEB06 BIOPSY ARM
>2 2 03MAR06 BIOPSY HEAD
>3 4 11MAR06 EXCISION ARM
>
>Dataset two (patients that did have a biopsy and subsequent excision)
>
>Patid reportid date type site
>2 2 03MAR06 BIOPSY LEG
>2 3 10MAR06 EXCISION LEG
>4 5 06APR06 BIOPSY HEAD
>4 6 10APR06 EXCISION HEAD
data example;
input patid reportid @8 date date7. @16 type$ site$;
cards;
1 1 02FEB06 BIOPSY ARM
1 1 06FEB06 BIOPSY ARM
2 2 03MAR06 BIOPSY LEG
2 2 03MAR06 BIOPSY HEAD
2 3 10MAR06 EXCISION LEG
3 4 11MAR06 EXCISION ARM
4 5 06APR06 BIOPSY HEAD
4 6 10APR06 EXCISION HEAD
;
First, sort so that each SITE can be considered separately:
proc sort data=example out=sites;
by patid site date;
run;
Now a self-interleave can first categorize the cases and then split the
table accordingly:
data one two;
set sites(in=preview)
sites;
by patid site;
retain biopsy_date both_types ;
drop biopsy_date both_types ;
if first.site then call missing (biopsy_date,both_types);
if preview then do;
if type='BIOPSY' then biopsy_date = date;
if type='EXCISION' and date > biopsy_date > . then both_types = 1;
end;
else do;
if both_types then output two;
else output one;
end;
run;
|