Date: Wed, 14 May 2003 12:52:09 -0600
Reply-To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Subject: Re: Merging a large and small dataset
Content-Type: text/plain; charset=us-ascii
The easiest way would be create formats for the two 5,000 record files,
and apply the formats to the appropriate fields in the big file (you
might want to create new fields whose only purpose is to be formatted,
so you can easily see both the original value and the formatted value).
Another option would be to index the two smaller files; you could then
use the SET smalldataset KEY=recordkey statement to look up the values
on the fly in a data step.
--
JackHamilton@FirstHealth.com
Manager, Technical Development
Metrics Department, First Health
West Sacramento, California USA
>>> "Robert Pope" <eschpope99@NETSCAPE.NET> 05/14/2003 11:36 AM >>>
I have a 4,000,000+ record file with policyholder information. Then I
have two 5,000 record files: one associates plancodes with the
reporting product line, the other associates treaty codes with
reinsurance companies.
I want to add the appropriate reporting product line and reinsurance
company name to each record in my policyholder file, while maintaining
the existing sort (non-)order.
The obvious solution is to add an _N_ variable, followed by a
SORT/MERGE for product followed by a second SORT/MERGE for reinsurer,
followed by a SORT by _N_. But those 3 sorts add an excessive amount
of run time to the program (IIRC 1 hour per sort). Is there a way to
avoid having to SORT the main dataset, perhaps with something like
Excel's VLookup function?
I would almost be tempted to hard-code the two small files into the
main data step, except they are continually updated.
Thanks,
Rob Pope
|