LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2009, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 28 Sep 2009 12:23:32 -0700
Reply-To:   "Richard A. DeVenezia" <rdevenezia@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Richard A. DeVenezia" <rdevenezia@GMAIL.COM>
Organization:   http://groups.google.com
Subject:   Re: A big flat file is not fitting into a disk. How to work with it?
Comments:   To: sas-l@uga.edu
Content-Type:   text/plain; charset=ISO-8859-1

On Sep 27, 12:53 am, Tetyana <teryoshi...@prodigy.net> wrote: > Hi All, > > My boss asked me for the answer and the code for the next question. > Can you please help? I copied his question completely. I don't have > any idea how to do it and I used SAS on a mainframe (UNIX) only once a > long time ago. > > If I had big flat file not fitting into a disk, say 2.2 billion > records, unsorted and has 50,000 different keys and I wanted to create > another file by merging the big file with a much smaller file of > 10,000 keys and I wanted only those records in the big file that DO > NOT MATCH the keys in the smaller file, how would I do it in the > mainframe. Remember that I can not allocate that much disk space even > if it's temporary. > > Best Regards, > Tetyana

Step one, DATA Step pass over SMALL - INPUT your key values from SMALL - use a DATA Step Hash to capture the distinct keys - output to dataset EXCLUSION_KEYS (only 10,000 rows)

Step two, DATA Step pass over TAPE - populate hash X from EXCLUSION_KEYS - INPUT your key values from TAPE - if X.find() ne 0; *TAPE keys not in EXCLUSION_KEYS; - INPUT the remainder of the record - OUTPUT to dataset FILTERED (or PUT to TAPE if filtered record count going to be to big)

do stuff with FILTERED

Here is a sample you can fiddle with ---------------- %let path = %sysfunc(PATHNAME(work));

data _null_; file "&path.\big.txt" dlm=','; do _n_ = 1 to 20000; key = 20000-_n_ + int(5*ranuni(1234)); array v(5) (1:5); put key v1-v5; end; run;

data _null_; file "&path.\small.txt" dlm=','; do key = 1 to 20000; if ranuni(1234) < 0.20 then PUT key; end; run;

data BIG_NOT_SMALL;

declare hash X (); X.defineKey('key'); X.defineDone();

infile "&path.\small.txt" end=end_of_small;

do while (not end_of_small); input key; X.replace(); end;

_count = X.num_items;

put 'number of keys in X:' _count;

infile "&path.\big.txt" dlm=',' end=end_of_big;

do while (not end_of_big); input key@; if X.find() = 0 then input; else do; input v1-v5; OUTPUT; end; end;

stop;

drop _:; run; ----------------

Note that the output is not sorted.

-- Richard A. DeVenezia http://www.devenezia.com


Back to: Top of message | Previous page | Main SAS-L page