Date: Sat, 30 Oct 2010 19:19:59 -0400
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: Collapsing date and coverage records with no gaps
Dan,
I still have to investigate further based on a suggestion that Howard sent
me offline, but I have run some tests already.
Given 4 years worth of data for about 500,000 patients (i.e., around
18,000,000 records):
my modification of your code took around 35 minutes.
Richards adaptation of Mike's approach took around 10 minutes.
Art
-------
On Sat, 30 Oct 2010 13:07:39 -0700, Daniel Nordlund
<djnordlund@FRONTIER.COM> wrote:
>> -----Original Message-----
>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>> Sterling Paramore
>> Sent: Saturday, October 30, 2010 9:26 AM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: Collapsing date and coverage records with no gaps
>>
>> I thought about converting coverage spans into distinct days of coverage
>> like these solutions, but my enrollment data is already pretty large. I
>> just estimated that if I did that, I'd end up having to process 2.2
>> billion
>> records, rather than the 1.5 million that I have (500,000 members X 3
>> coverage types X 4 years). I look forward to trying your solution when
I
>> get back to work Monday.
>>
>
>Sterling,
>
>I have not benchmarked my approach against any other options, so I am not
going to make any claims for whether it is better in some sense than the
others that have been suggested. But let me correct an apparent
misunderstanding. No new records need to be created. If you have 1.5
million records, you will only need to read your 1.5 million records
once. The array used for holding four years of eligibility will only take
365 days * 4 years * 3 characters (coverage type), or about 4380 bytes of
memory. You could handle 20 years of eligibility in less than 22,000
bytes. Art's suggestion of using SQL to get the earliest and latest dates
in your file would allow you to tailor the size of the coverage array to
fit a particular span of dates. However, I am not sure that making an
extra pass through a large file is worth the time. All you will be saving
is a small amount of memory and a little array processing time. So I
would just make the array longer than necessary.
>
>Whatever approach you choose, best of luck in wrestling your eligibility
data to the ground.
>
>Dan
>
>Daniel Nordlund
>Bothell, WA USA
|