Date: Mon, 28 Sep 2009 18:14:24 -0400
Reply-To: msz03@albany.edu
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Zdeb <msz03@ALBANY.EDU>
Subject: Re: A big flat file is not fitting into a disk. How to work with
it?
Content-Type: text/plain;charset=iso-8859-1
hi ... basic idea of using a format to grab records from a file
goes way back ... 1996 (maybe further, probably as soon as when the
CNTLIN data set option was added to PROC FORMAT ... whenever that was)
http://www.lexjansen.com/sugi/sugi21/po/220-21.pdf
--
Mike Zdeb
U@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
> Tetyana -
>
> For a little more information on Paul D's format method - it was
> recently presented at WUSS as "Proc Format, a Speedy Alternative to
> Sort/Sort Merge" by Jenine Milum. Good solution for searching an
> oversized data set on one pass.
>
> http://www.wuss.org/proceedings09/09WUSSProceedings/html/source/sections
> /app.html
>
> On your ouptut data - not sure if it will also be too large - but here's
> a few pointers - run a few tests first using "options obs=100000;" or
> something suitable, and restore on the final run with "options
> obs=max;".
>
> Use the tests to try out "options compress=yes;" and shortening unneeded
> long vars. A proc contents and some proc freqs might be helpful to
> examine the vars. For example if you have a large number of single digit
> numbers stored as numeric, translating them to 1 byte character will
> reduce your file by up to 7/8th. Text fields are often the biggest
> culprits - look for free form text fields - they can be 1000 bytes or
> more of unneeded space - drop 'em if you don't need them.
>
> Also look into your mainframe allocations - you can grab a lot of space
> on the mainframe if you need - something like this grabs up to 59
> volumes on zOS with 4000+16*4000 cylinders:
>
> LIBNAME BIG "YOUR.SAS.FILE.REF" DISP=NEW UNIT=(SYSDA,59)
> SPACE=(CYL,(4000,4000)) LABEL=RETPD=7;
>
> or
>
> FILENAME BIG "YOUR.FLAT.FILE.REF" DISP=NEW UNIT=(SYSDA,59)
> SPACE=(CYL,(4000,4000)) LRECL=1024 RECFM=FB LABEL=RETPD=7;
>
> You'll need to figure out an appropriate LRECL and RECFM etc.
>
> Remember to delete this asap after you are finished or you might get a
> nasty phone call from your DBA
>
> Hope that helps -
>
> Paul Choate
> DDS Data Extraction
> (916) 654-2160
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Tetyana
> Sent: Saturday, September 26, 2009 9:53 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: A big flat file is not fitting into a disk. How to work with
> it?
>
> Hi All,
>
> My boss asked me for the answer and the code for the next question.
> Can you please help? I copied his question completely. I don't have
> any idea how to do it and I used SAS on a mainframe (UNIX) only once a
> long time ago.
>
> If I had big flat file not fitting into a disk, say 2.2 billion
> records, unsorted and has 50,000 different keys and I wanted to create
> another file by merging the big file with a much smaller file of
> 10,000 keys and I wanted only those records in the big file that DO
> NOT MATCH the keys in the smaller file, how would I do it in the
> mainframe. Remember that I can not allocate that much disk space even
> if it's temporary.
>
> Best Regards,
> Tetyana
>
|