Date: Fri, 6 Jun 2003 15:01:46 -0500
Reply-To: pudding_man@lycos.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Puddin' Man <pudding_man@LYCOS.COM>
Organization: Lycos Mail (http://www.mail.lycos.com:80)
Subject: Re: One pass to find max and maxcnt using data step?
Content-Type: text/plain; charset=us-ascii
I think Paul is aware of the fact that I am a 'fan'
of direct-addressing techniques in general and key-
indexing in particular ...
The V9 hash array approach is very interesting and
instructive. I have no problem with Paul's post.
I think, however, that parsimony is a valid issue
here. Ya's request is sooooooooo simply satisfied
with a couple scalar vars as to make direct-addressing
kinda sorta 'overkill', somewhat akin to writing
macros where none are needed.
I would be inclined toward something like:
data yy(drop = dos);
max_dos = .; maxcnt = 0;
do until (last.id);
set xx; by id;
if dos < max_dos then continue;
maxcnt = 1 + (max_dos = dos) * maxcnt;
max_dos = dos;
end;
run;
Very simple code which, I think, would run fine with
V6-V9 and perhaps burn a few less cpu cycles.
Prost,
Puddin'
***********************************************************
*** Puddin' Man *** Pudding_Man@lycos.com ********
***********************************************************;
"If I don't go crazy
I will surely lose my mind."
-from "Mean Black Spider Blues", Robt Lockwood Jr, around 1940.
--------- Original Message ---------
DATE: Thu, 5 Jun 2003 17:53:05
From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>
To: SAS-L@LISTSERV.UGA.EDU
Cc:
>Oh man,
>
>You must be pulling my sas-leg once again. You know perfectly well that when
>key-indexing becomes a drag because of the key range, you should use a hash
>table. Of course, with hand-coded hashing, *some* assumption has to be made.
>For example, I am taking the liberty to assume that the maximum size of any
>ID-group does not exceed 100 (is it reasonable? Change that if need be), and
>so size the linearly-probed hash table at 203 (lowest prime greater than
>100*2). The factor 1e13 is selected just to cover all possible fractional
>territory. If it by chance does not, and some eksy after the multiplication
>remain fractional, it will not affect the algorithm:
>
>%let h = 203 ;
>
>data squeezed ( keep = id max_dos max_cnt ) ;
> array h ( 0 : &h ) _temporary_ ;
> array f ( 0 : &h ) _temporary_ ;
>
> do until ( last.id ) ;
> set xx ;
> by id ;
> do j = mod (dos * 1e13, &h) by 1 until ( h(j) = dos ) ;
> if j = &h then j = 0 ;
> if h(j) = . then h(j) = dos ;
> end ;
> f(j) ++ 1 ;
> end ;
>
> do j = 0 to &h - 1 ;
> if h(j) > max_dos then do ;
> max_dos = h(j) ;
> max_cnt = f(j) ;
> end ;
> h(j) = . ;
> f(j) = . ;
> end ;
>run ;
>
>Now isit not curious that I concocted a similar, albeit somewhat more
>involved, problem for our Seattle presentation of "Hashing: Generations" to
>show how in V9, the hash iterator object ('hi' below) could be used to
>extract a satellite with the largest hash table key? In terms of the present
>problem, it would look like this:
>
>data squeezed ( keep = id max_dos max_cnt ) ;
> length dos cnt 8 ;
>
> dcl hash hh ( hashexp: 10, ordered: 1 ) ;
> dcl hiter hi ( 'hh' ) ;
> hh.DefineKey ( 'dos' ) ;
> hh.DefineData ( 'dos', 'cnt') ;
> hh.DefineDone () ;
>
> do until ( last.id ) ;
> set xx ;
> by id ;
> rc = hh.find () ;
> if rc = 0 then cnt ++ 1 ;
> else cnt = 1 ;
> rc = hh.replace () ;
> end ;
>
> rc = hi.last () ;
>
> max_dos = dos ;
> max_cnt = cnt ;
>run ;
>
>(By the same token, the hi.first() method would extract the frequency with
>the minimum DOS). Interestingly, the same approach can be used if the
>problem were inverted, and, instead of finding the frequency corresponding
>to the largest DOS in the group, you needed to obtain the DOS corresponding
>to the largest frequency. The only principal change we would need in the
>latter case would be to key the table by CNT instead of by DOS, which thus
>becomes the satellite:
>
>data squeezed ( keep = id max_dos max_cnt ) ;
> length dos cnt 8 ;
>
> dcl hash ff ( hashexp: 10, ordered: 1 ) ;
> dcl hiter fi ( 'ff' ) ;
> ff.DefineKey ( 'cnt' ) ;
> ff.DefineData ( 'cnt', 'dos') ;
> ff.DefineDone () ;
>
> do until ( last.id ) ;
> do cnt = 1 by 1 until ( last.dos ) ;
> set xx ;
> by id dos notsorted ;
> end ;
> ff.replace () ;
> end ;
>
> fi.last () ;
>
> max_dos = dos ;
> max_cnt = cnt ;
>run ;
>
>Sorry for the post length and certain things not everyone can test yet for
>the lack of V9. But at least I have tested them on my machine, and they work
>as planned.
>
>Kind regards,
>-----------------------
>Paul M. Dorfman
>Jacksonville, FL
>-----------------------
>
>
>>From: "Huang, Ya" <yhuang@AMYLIN.COM>
>>
>>Thanks Paul,
>>
>>I thought about using temporary array, but gave up
>>because dos's value can be anything, decimal or integer,
>>it could be very hard to transform it to integer
>>so that it can be used as array index. Suppose
>>dos takes 0.001,0.1,1,10,100, to transform them all
>>into a integer, the factor need to be 1000, and the
>>array size would be 100,000, while only 5 cell of
>>this huge array will be really used for frequency
>>counting, it seems very low efficient.
>>Maybe I can use format, but it doesn't seem to worth it.
>>
>>Kind regards,
>>
>>Ya
>>
>>
>>-----Original Message-----
>>From: Paul Dorfman [mailto:paul_dorfman@HOTMAIL.COM]
>>Sent: Wednesday, June 04, 2003 10:01 PM
>>To: SAS-L@LISTSERV.UGA.EDU
>>Subject: Re: One pass to find max and maxcnt using data step?
>>
>>
>>Ya,
>>
>>Man, you must be pulling our sas-leg:
>>
>>%let factor = 1000 ;
>>
>>data squeezed ( keep = id max_dos max_cnt ) ;
>> array grp ( 0 : &factor ) _temporary_ ;
>>
>> do _n_ = lbound (grp) to hbound (grp) ;
>> grp (_n_) = . ;
>> end ;
>>
>> do until ( last.id ) ;
>> set xx ;
>> by id ;
>> grp (dos * &factor) ++ 1 ;
>> end ;
>>
>> do _n_ = hbound (grp) by -1 until ( grp(_n_) ) ;
>> end ;
>>
>> max_dos = _n_ / &factor ;
>> max_cnt = grp(_n_) ;
>>run ;
>>
>>Kind regards,
>>---------------------------------
>>Paul M. Dorfman
>>Jacksonville, FL
>>---------------------------------
>>
>>
>>
>> >From: "Huang, Ya" <yhuang@AMYLIN.COM>
>> >Reply-To: "Huang, Ya" <yhuang@AMYLIN.COM>
>> >To: SAS-L@LISTSERV.UGA.EDU
>> >Subject: One pass to find max and maxcnt using data step?
>> >Date: Wed, 4 Jun 2003 15:47:00 -0700
>> >
>> >Hi there,
>> >
>> >data xx;
>> >input id dos;
>> >cards;
>> >1 0.1
>> >1 0.1
>> >1 0.01
>> >1 0.01
>> >1 0.01
>> >1 0.01
>> >1 0.3
>> >1 0.3
>> >1 0.3
>> >2 0.2
>> >2 0.2
>> >2 0.6
>> >2 0.05
>> >2 0.05
>> >2 0.05
>> >2 0.05
>> >2 0.6
>> >2 0.6
>> >2 0.6
>> >2 0.6
>> >2 0.6
>> >;
>> >
>> >For the above data, I need to find, for each id, the max of dos,
>> >and the number of dos at the max level, so the expected result
>> >is as below:
>> >
>> >id max_dos maxcnt
>> >1 0.3 3
>> >2 0.6 6
>> >
>> >It is very easy to get the max_dos, I just wonder if it is
>> >possible to get the count of max_dos in the same step?
>> >
>> >Any comments?
>> >
>> >Thanks
>> >
>> >Ya
>>
____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005
|