LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 30 Oct 2010 19:19:59 -0400
Reply-To:     Arthur Tabachneck <art297@NETSCAPE.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@NETSCAPE.NET>
Subject:      Re: Collapsing date and coverage records with no gaps
Comments: To: Daniel Nordlund <djnordlund@FRONTIER.COM>

Dan,

I still have to investigate further based on a suggestion that Howard sent me offline, but I have run some tests already.

Given 4 years worth of data for about 500,000 patients (i.e., around 18,000,000 records):

my modification of your code took around 35 minutes.

Richards adaptation of Mike's approach took around 10 minutes.

Art ------- On Sat, 30 Oct 2010 13:07:39 -0700, Daniel Nordlund <djnordlund@FRONTIER.COM> wrote:

>> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of >> Sterling Paramore >> Sent: Saturday, October 30, 2010 9:26 AM >> To: SAS-L@LISTSERV.UGA.EDU >> Subject: Re: Collapsing date and coverage records with no gaps >> >> I thought about converting coverage spans into distinct days of coverage >> like these solutions, but my enrollment data is already pretty large. I >> just estimated that if I did that, I'd end up having to process 2.2 >> billion >> records, rather than the 1.5 million that I have (500,000 members X 3 >> coverage types X 4 years). I look forward to trying your solution when I >> get back to work Monday. >> > >Sterling, > >I have not benchmarked my approach against any other options, so I am not going to make any claims for whether it is better in some sense than the others that have been suggested. But let me correct an apparent misunderstanding. No new records need to be created. If you have 1.5 million records, you will only need to read your 1.5 million records once. The array used for holding four years of eligibility will only take 365 days * 4 years * 3 characters (coverage type), or about 4380 bytes of memory. You could handle 20 years of eligibility in less than 22,000 bytes. Art's suggestion of using SQL to get the earliest and latest dates in your file would allow you to tailor the size of the coverage array to fit a particular span of dates. However, I am not sure that making an extra pass through a large file is worth the time. All you will be saving is a small amount of memory and a little array processing time. So I would just make the array longer than necessary. > >Whatever approach you choose, best of luck in wrestling your eligibility data to the ground. > >Dan > >Daniel Nordlund >Bothell, WA USA


Back to: Top of message | Previous page | Main SAS-L page