Date: Mon, 28 Sep 2009 10:39:33 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: A big flat file is not fitting into a disk. How to work with
it?
In-Reply-To: A<218bc8a3-9c7c-4b7c-a020-754095e8baa3@d34g2000vbm.googlegroups.com>
Content-Type: text/plain; charset="US-ASCII"
Tetyana -
For a little more information on Paul D's format method - it was
recently presented at WUSS as "Proc Format, a Speedy Alternative to
Sort/Sort Merge" by Jenine Milum. Good solution for searching an
oversized data set on one pass.
http://www.wuss.org/proceedings09/09WUSSProceedings/html/source/sections
/app.html
On your ouptut data - not sure if it will also be too large - but here's
a few pointers - run a few tests first using "options obs=100000;" or
something suitable, and restore on the final run with "options
obs=max;".
Use the tests to try out "options compress=yes;" and shortening unneeded
long vars. A proc contents and some proc freqs might be helpful to
examine the vars. For example if you have a large number of single digit
numbers stored as numeric, translating them to 1 byte character will
reduce your file by up to 7/8th. Text fields are often the biggest
culprits - look for free form text fields - they can be 1000 bytes or
more of unneeded space - drop 'em if you don't need them.
Also look into your mainframe allocations - you can grab a lot of space
on the mainframe if you need - something like this grabs up to 59
volumes on zOS with 4000+16*4000 cylinders:
LIBNAME BIG "YOUR.SAS.FILE.REF" DISP=NEW UNIT=(SYSDA,59)
SPACE=(CYL,(4000,4000)) LABEL=RETPD=7;
or
FILENAME BIG "YOUR.FLAT.FILE.REF" DISP=NEW UNIT=(SYSDA,59)
SPACE=(CYL,(4000,4000)) LRECL=1024 RECFM=FB LABEL=RETPD=7;
You'll need to figure out an appropriate LRECL and RECFM etc.
Remember to delete this asap after you are finished or you might get a
nasty phone call from your DBA
Hope that helps -
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Tetyana
Sent: Saturday, September 26, 2009 9:53 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: A big flat file is not fitting into a disk. How to work with
it?
Hi All,
My boss asked me for the answer and the code for the next question.
Can you please help? I copied his question completely. I don't have
any idea how to do it and I used SAS on a mainframe (UNIX) only once a
long time ago.
If I had big flat file not fitting into a disk, say 2.2 billion
records, unsorted and has 50,000 different keys and I wanted to create
another file by merging the big file with a much smaller file of
10,000 keys and I wanted only those records in the big file that DO
NOT MATCH the keys in the smaller file, how would I do it in the
mainframe. Remember that I can not allocate that much disk space even
if it's temporary.
Best Regards,
Tetyana