LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 14 Jan 2011 18:17:12 -0500
Reply-To:     Arthur Tabachneck <art297@ROGERS.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@ROGERS.COM>
Subject:      Re: Find Records in a Cluster by Date
Comments: To: Steve James <spj1@CDC.GOV>
Content-Type: text/plain; charset=ISO-8859-1

James,

The following probably a very inefficient way to do what you want but, before trying to streamline it (if that is even necessary), does the following do what you need? It assumes no overlap:

proc sort data=have; by Patient Type Date; run;

proc transpose data=have out=wide; by Patient Type; var Date; run;

data temp (keep=Patient Type Date ClusterID); set wide; format Date mmddyy10.; array Dates(*) COL:; array ClusterIDs(50); ClusterIDNumber=0; do i=1 to 50; if i gt dim(Dates) then i=50; else if missing(Dates(i)) then i=50; else do; if i eq 1 then do; ClusterIDNumber+1; ClusterIDs(i)=ClusterIDNumber; end; else do; if Dates(i)-Dates(i-1) gt 23 then do; ClusterIDNumber+1; ClusterIDs(i)=ClusterIDNumber; end; else ClusterIDs(i)=ClusterIDNumber; end; Date=Dates(i); ClusterID=ClusterIDs(i); output; end; end; run;

proc sql noprint; create table want as select *, count(*) as ClusterSize from temp group by patient,type,clusterID ; quit;

HTH, Art -------- On Fri, 14 Jan 2011 17:07:50 -0500, James, Steve (CDC/OID/OD) <spj1@CDC.GOV> wrote:

>Thanks, > >We anticipate up to several million patients with a top end of 20-30 million vaccination records. There could be 40-50 records per patient with as many as 14 different vaccine types, although we anticipate to average about 10 records per patients. > >Steve > >-----Original Message----- >From: Arthur Tabachneck [mailto:art297@ROGERS.COM] >Sent: Friday, January 14, 2011 4:58 PM >To: SAS-L@LISTSERV.UGA.EDU; James, Steve (CDC/OID/OD) >Subject: Re: Find Records in a Cluster by Date > >Steve, > >I haven't kept up with this thread, thus might be asking a couple of >questions that you've already answered. > >One, how many patients will be in the data? > >And, two, approximately what is the maximum number of vaccine records that >might exist for a patient? > >Art >-------- >On Fri, 14 Jan 2011 16:52:47 -0500, Steve James <spj1@CDC.GOV> wrote: > >>SAS-L >> >>Barry Swartz pointed out some discrepancies in my previous post, so let me >>modify the test data a little and clarify what I want. >> >>We’re looking for possible duplicate vaccine records and the business rule >>is that we look for records within a date of 23 days of each other. We >>define the clusters and then do the deduplication process within a >>cluster. It’s theoretically possible a record could be in more than one >>cluster, but I don’t know if I have any at this point. >> >>Also a patient can have multiple vaccine types and each type can have >>multiple clusters. >> >>Hopefully that clarifies things. >> >>I’ve modified the data so that there are no overlapping clusters. >> >>Have: >> >>Patient Type Date >>1 1 01/01/10 >>1 1 01/04/10 >>1 1 01/15/10 >>1 1 04/28/10 >>1 1 05/04/10 >>2 1 06/14/10 >> >>Want >> >>Patient Type Date ClusterID ClusterSize >>1 1 01/01/10 1 3 >>1 1 01/04/10 1 3 >>1 1 01/15/10 1 3 >>1 1 04/28/10 2 2 >>1 1 05/04/10 2 2 >>2 1 06/14/10 1 1 >> >> >>Thanks for your help, >> >>Steve >> >>_____________________________________________ >>From: James, Steve (CDC/OID/OD) >>Sent: Friday, January 14, 2011 2:49 PM >>To: SAS-L@LISTSERV.UGA.EDU >>Subject: Find Records in a Cluster by Date >> >> >>Dear SAS-L, >> >>I’m having a hard time getting my head around this one, and have had very >>little luck searching the SAS-L archives. I’m sure someone out there can >>whip out something in a second. And if not, that justifies my struggling >>with it. >> >>I’ve got vaccination records ordered by patient and vaccine type and I >>want to define a cluster for each record that is within 23 days of every >>other record for the same patient and vaccine type. I would like some >>sort of identifier that identifies each cluster and the size of the >>cluster. I’m anticipating clusters of from 1-4 records but there >>potentially could be larger ones. >> >>I figure that some records may be in multiple clusters, e.g. 1/1/10, >>1/15/10, and 1/31/10 – not sure exactly what I want to do with that but >>it’s something that could happen and if I can at least identify them >>that’d be a plus. >> >>Below is a rough example of what I’m hoping to accomplish. >> >>Have: >> >>Patient Type Date >>1 1 01/01/10 >>1 1 01/04/10 >>1 1 01/15/10 >>1 1 01/28/10 >>1 1 02/04/10 >>2 1 02/14/10 >> >>Want >> >>Patient Type Date ClusterID ClusterSize >>1 1 01/01/10 1 3 >>1 1 01/04/10 1 3 >>1 1 01/15/10 1 3 >>1 1 01/28/10 2 2 >>1 1 02/04/10 2 2 >>2 1 02/14/10 1 1 >> >> >>Thanks for your help, >> >>Steve


Back to: Top of message | Previous page | Main SAS-L page