| Date: | Thu, 23 Nov 2006 14:56:58 -0800 |
| Reply-To: | Sekhar <ckalisetty@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sekhar <ckalisetty@GMAIL.COM> |
| Organization: | http://groups.google.com |
| Subject: | Re: Sorting a huge huge dataset |
|
| In-Reply-To: | <2fc7f3340611231203p4c21dd74kf6bfe87fece5a861@mail.gmail.com> |
| Content-Type: | text/plain; charset="us-ascii" |
|---|
I tried the key_indexing, but I am having same problem as I mentioned
in the email like the subscript out of range . I am having this problem
not only for the account number but for other variables also.
Muthia Kachirayan wrote:
> You are trying V9 Hash solution now and someone will soon help you. The
> KEY_INDEXING solution I gave before, if worked, should have given you the
> speediest output for the big Files. If it has worked OR not worked for you,
> it would be good to know the problems you have faced. Can you give us some
> feedback on the use of Key_Indexing?
>
> Muthia Kachirayan
>
> On 11/23/06, Sekhar <ckalisetty@gmail.com> wrote:
> >
> > OK. I fixed the memory problem by using -memsize max option. Now I have
> > different problem. I wanted to keep observations only when they are in
> > both datasets. That's why I used "if rc=0 then output". But by doing so
> > I got onlly 30 obs in the output dataset. How can I implement the
> > condition that output dataset only will have observation with non
> > missing open date? thanks in advance.
> >
> > On Nov 17, 4:01 pm, EvilPettingZo...@AOL.COM (Ken Borowiak) wrote:
> > > On Fri, 17 Nov 2006 12:43:07 -0800,Sekhar<ckalise...@GMAIL.COM> wrote:
> > > >Hi
> > > >Looks like I have memory problem.
> > > >Here is my code.
> > >
> > > >***** first declare a hash table for the small dataset work2 ******;
> > > >data test;
> > > >declare hash h_small ();
> > > >***** defien this hash table ******;
> > > >length prod_acc open_dt losbal 8.;
> > > >rc= h_small.DefineKey ("prod_acc");
> > > >rc= h_small.DefineData ("open_dt","losbal");
> > > >rc= h_small.DefineDone ();
> > > >**** fill this hash table *******;
> > > >do until(eof_small);
> > > >set work.work2 end=eof_small;
> > > >rc = h_small.add ();
> > > >end;
> > > >**** access the hash table *****;
> > > >do until(eof_big);
> > > >set work.combine end=eof_big;
> > > >open_dt=.; losbal=.;
> > > >rc=h_small.find ();
> > > >output;
> > > >end;
> > > >run;
> > >
> > > >When I run this with 10,000,000 obs I get the following error message.
> > >
> > > >ERROR: Hash object added 8126448 items when memory failure occurred.
> > > >FATAL: Insufficient memory to execute data step program. Aborted during
> > > >the EXECUTION phase.
> > >
> > > >But when I run the following code to check memory no error.
> > > >%let n = 35000000;
> > > >data _null_;
> > > >array k[&n] _temporary_;
> > > >run;
> > >
> > > >Should I look for bigger memory. ?You can use your memory resources a
> > bit more selectively if you hash a
> > > record pointer (rather than 2 satellite fields) and using the POINT= on
> > the
> > > SET statement to retrieve the satellite records.
> > > See the latter portion of the Paul Dorfman and Lessia Shajenko's SUGI 31
> > > paper "Crafting Your Index: ".
> > >
> > > HTH,
> > > Ken
> >
|