Date: Mon, 26 Apr 1999 11:44:25 -0700
Reply-To: "William W. Viergever" <wwvierg@IBM.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "William W. Viergever" <wwvierg@IBM.NET>
Subject: Re: How to trick Proc Freq?
In-Reply-To: <19990426073109.14637.qmail@hotmail.com>
Content-Type: text/plain; charset="us-ascii"
David:
Having hit this post "first-in-line" in today's e-mail, I'm betting we're gonna have some FEEDBACK on this one!!!!!!!!!!!!!!!!!!!!!!!!!! <vbg>.
BTW, Hail to the Guru's!!
My $0.02: you don't don't have to run their code if you *don't* like it *OR* understand it; but you needn't be so rude as to trash them. You're unsolicited smugness is not becoming.
Regards
P.S. SAS Dog since 1975
At 12:31 AM 4/26/99 -0700, david pider wrote:
>Alex Martchenko wrote:
>>
>> Hi,
>>
>> I have a credit policy SAS data set with almost 100 million
>>observations. It has a numeric variable FLDR_ID that can be any
>>integer number from -500,000 to +500,000. So on the average there are
>>about 100 observations with the same FLDR_ID on the file. The data
>>set is sorted by a different variable (account number) and so it's
>>unsorted by FLDR_ID. I'm trying to create a data set with
>>frequencies, cumulative frequencies, percents and cumulative percents
>>for all values of FLDR_ID. I'm trying to use proc freq like that:
>>
>> proc freq data=folders(keep=fldr_id);
>> tables fldr_id / missing out=fldr_fq;
>>
>>But I have 2 problems: 1) proc freq runs out of memory and abends
>>the job 2) if I run it on a much smaller subset so it runs OK, it
>>doesn't create cumulative variables in the OUT= file. I could live
>>with problem 2 because I can read the output data set with frequency
>>and then calculate cumulatives in a data step.
>> Real pain is problem 1. I run it in batch on OS/390, 6.09E. A job
>>can only use 50M of memory (that's all system people say we can have
>>for a single SAS job). But proc freq wants more, so if I specify
>>MEMSIZE=100M it doesn't matter because after 50M limit is exceeded,
>>the job abends anyway.
>
>If you read SAS manuals you wouldn't be in trouble. About PROC FREQ,
>the manual says 'the maximum number of levels allowed for any one
>variable is 32,767. If you have a variable with more than 32,767
>levels, use the SUMMARY procedure'.
>
>>I know I can sort it by FLDR_ID and then use first. and last.
>>variables to do what I need. But in our system sorting 100 million
>>records is a big problem itself. Does anyone have suggestions? I will
>>greatly appreciate any input.
>
>Here is one: if some of self appointed 'gurus' offer you a 'datastep
>algorithm' claiming it is 'more efficient' that SAS procs do yourself
>a favor and use 'delete' button. Trust my 15 years in SAS, none of
>thses 'solutions' run faster or 'more efficiently' than properly
>applied SAS procs. I've just fired one 'expert' as soon as I saw that
>he tried to get smart and summarized in datastep instead of proc
>summary. If I can't ban them 'gurus' from SAS profession, at least I
>can keep them away from my shop. And it's getting worse. Now this
>'datastep approach' have even sneaked into SUGI. I almost fainted at
>seeing a paper by one of them 'gurus' (who I bet have no idea of real
>world programing) trying to replace proc sort with his 'quick sort'
>monstrosity. I wonder why he didn't call it 'blast sort' or something?
>
>BTW, try SQL too. Personally, I don't like it (it shouldn't of been in
>SAS in the first place) but at least it's a SAS proc and at least will
>beat any 'datastep algorithm' hands down.
>
>D. Pider
>Technical Manager
>15 years of SAS experience
>
>
>
>
>
>_______________________________________________________________
>Get Free Email and Do More On The Web. Visit http://www.msn.com
----------------------------------------------------------------------------
William W. Viergever Voice : (916) 483-8398
Viergever & Associates Fax : (916) 486-1488
A SAS Institute Quality Partner (USA) E-mail : wwvierg@ibm.net
Sacramento, CA 95825
----------------------------------------------------------------------------
|