Date: Tue, 28 Dec 2004 15:40:32 -0500
Reply-To: Mike Rhoads <RHOADSM1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Rhoads <RHOADSM1@WESTAT.COM>
Subject: Re: Big Table / Little Table Update
Content-Type: text/plain
Steve,
It seems to me that this should be doable, assuming that your big table is
(or can be) indexed by patient ID. As I am understanding your situation,
the vast majority of patients in your big table are not in your little
table, so that their records would not have to be changed during the update
process.
Conceptually, I'd do the following:
1. Add your records from Little to Big. (Don't worry about the flag for
now.)
2. Create a table, IDsInLittle, with one column (ID) and one record for
each distinct ID in Little. This will drive the rest of the process.
3. Create a table, MostRecentVisit, with columns for ID and Date, to get
the most recent date for each of the patients in IDsInLittle, by joining
IDsInLittle with Big. Hopefully the SQL optimizer will only process the
records from Big that match the IDs in IDsInLittle, so that this will run
quickly.
4. Use an SQL update statement with a CASE construct to set Flag for each
record in Big whose ID is in MostRecentVisit to 0 or 1, depending on whether
the date in Big matches the one in MostRecentVisit.
HTH!
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of James,
Steve
Sent: Tuesday, December 28, 2004 11:09 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Big Table / Little Table Update
SAS-L
We have a problem that I'd like your help on. We have a huge data set
which needs to be updated daily with patient records. Each record has a
patient ID and date of visit. Thus, there are multiple occurrences of
both patient ID and date of visit. What we want to do is to add the
incoming records to the existing data and then set a flag to designate
which records are from the most recent visit. For example:
Big_Table
ID Date Flag
1 12/1 1
1 12/1 1
1 11/31 0
2 12/1 0
2 12/3 1
etc....
Incoming Table
ID Date
1 12/2
1 12/3
2 12/2
2 12/4
Will result in the following
1 12/1 0
1 12/1 0
1 11/31 0
1 12/2 0
1 12/3 1
2 12/1 0
2 12/3 0
2 12/2 0
2 12/4 1
etc...
The main idea is not to have to re-create the big table each time it's
updated since it takes hours to do. Any thoughts on how to do this will
be most appreciated.
Steve James
Centers for Disease Control and Prevention
sjames@cdc.gov