LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 6 Jun 2003 18:42:05 -0400
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject:      Re: One pass to find max and maxcnt using data step?
Comments: To: pudding_man@lycos.com
In-Reply-To:  <ADHPMNKHNGJEIFAA@mailcity.com>
Content-Type: text/plain; charset="us-ascii"

Puddin',

Agreed with the solution beautiful in its simplicity. Two points. One concerns the parsimony: Once parsimonious, let us be parsimonious all the way and eliminate one more LOC:

data yy (drop = dos) ; do maxcnt = 0 by 0 until (last.id) ; set xx ; by id ; if dos < max_dos then continue ; maxcnt = 1 + (max_dos = dos) * maxcnt ; max_dos = dos ; end ; run ;

Secondly, the solution is implicitly based on the fact that only the count corresponding to the largest (or smallest) dose must be obtained, which is why it is unnecessary to employ some method to store the rest of the counts corresponding to other unique DOS values. But if it were needed to obtain the frequencies corresponding to N top (bottom) values, it would not suffice. In the pre-V9 hash schemes, the problem is then reduced to finding N top (bottom) values among M > N unordered values. In V9+, it is much simpler, since the hash iterator allows to extract the keys from the table (and their satellite information) into their Data step host variables in order. That is, instead of the simplest

rc = hi.last () ;

one would code

rc = hi.last () ;

do j = 1 to N until ( rc = 0 ) ; max_dos = dos ; max_cnt = cnt ; output ; rc = hi.prev () ; end ;

Kind regards, ------------------------ Paul M. Dorfman Jacksonville, FL ------------------------

> -----Original Message----- > From: Puddin' Man [mailto:pudding_man@lycos.com] > Sent: Friday, June 06, 2003 4:02 PM > To: SAS-L@LISTSERV.UGA.EDU; sashole@bellsouth.net > Subject: Re: One pass to find max and maxcnt using data step? > > > I think Paul is aware of the fact that I am a 'fan' > of direct-addressing techniques in general and key- > indexing in particular ... > > The V9 hash array approach is very interesting and > instructive. I have no problem with Paul's post. > > I think, however, that parsimony is a valid issue > here. Ya's request is sooooooooo simply satisfied > with a couple scalar vars as to make direct-addressing > kinda sorta 'overkill', somewhat akin to writing > macros where none are needed. > > I would be inclined toward something like: > > data yy(drop = dos); > max_dos = .; maxcnt = 0; > > do until (last.id); > set xx; by id; > if dos < max_dos then continue; > maxcnt = 1 + (max_dos = dos) * maxcnt; > max_dos = dos; > end; > > run; > > Very simple code which, I think, would run fine with > V6-V9 and perhaps burn a few less cpu cycles. > > Prost, > Puddin' > > *********************************************************** > *** Puddin' Man *** Pudding_Man@lycos.com ******** > ***********************************************************; > > > "If I don't go crazy > I will surely lose my mind." > -from "Mean Black Spider Blues", Robt Lockwood Jr, around 1940. > > --------- Original Message --------- > > DATE: Thu, 5 Jun 2003 17:53:05 > From: Paul Dorfman <paul_dorfman@HOTMAIL.COM> > To: SAS-L@LISTSERV.UGA.EDU > Cc: > > >Oh man, > > > >You must be pulling my sas-leg once again. You know > perfectly well that > >when key-indexing becomes a drag because of the key range, > you should > >use a hash table. Of course, with hand-coded hashing, *some* > assumption > >has to be made. For example, I am taking the liberty to > assume that the > >maximum size of any ID-group does not exceed 100 (is it reasonable? > >Change that if need be), and so size the linearly-probed > hash table at > >203 (lowest prime greater than 100*2). The factor 1e13 is > selected just > >to cover all possible fractional territory. If it by chance > does not, > >and some eksy after the multiplication remain fractional, it > will not > >affect the algorithm: > > > >%let h = 203 ; > > > >data squeezed ( keep = id max_dos max_cnt ) ; > > array h ( 0 : &h ) _temporary_ ; > > array f ( 0 : &h ) _temporary_ ; > > > > do until ( last.id ) ; > > set xx ; > > by id ; > > do j = mod (dos * 1e13, &h) by 1 until ( h(j) = dos ) ; > > if j = &h then j = 0 ; > > if h(j) = . then h(j) = dos ; > > end ; > > f(j) ++ 1 ; > > end ; > > > > do j = 0 to &h - 1 ; > > if h(j) > max_dos then do ; > > max_dos = h(j) ; > > max_cnt = f(j) ; > > end ; > > h(j) = . ; > > f(j) = . ; > > end ; > >run ; > > > >Now isit not curious that I concocted a similar, albeit > somewhat more > >involved, problem for our Seattle presentation of "Hashing: > >Generations" to show how in V9, the hash iterator object > ('hi' below) > >could be used to extract a satellite with the largest hash > table key? > >In terms of the present problem, it would look like this: > > > >data squeezed ( keep = id max_dos max_cnt ) ; > > length dos cnt 8 ; > > > > dcl hash hh ( hashexp: 10, ordered: 1 ) ; > > dcl hiter hi ( 'hh' ) ; > > hh.DefineKey ( 'dos' ) ; > > hh.DefineData ( 'dos', 'cnt') ; > > hh.DefineDone () ; > > > > do until ( last.id ) ; > > set xx ; > > by id ; > > rc = hh.find () ; > > if rc = 0 then cnt ++ 1 ; > > else cnt = 1 ; > > rc = hh.replace () ; > > end ; > > > > rc = hi.last () ; > > > > max_dos = dos ; > > max_cnt = cnt ; > >run ; > > > >(By the same token, the hi.first() method would extract the > frequency > >with the minimum DOS). Interestingly, the same approach can > be used if > >the problem were inverted, and, instead of finding the frequency > >corresponding to the largest DOS in the group, you needed to > obtain the > >DOS corresponding to the largest frequency. The only > principal change > >we would need in the latter case would be to key the table by CNT > >instead of by DOS, which thus becomes the satellite: > > > >data squeezed ( keep = id max_dos max_cnt ) ; > > length dos cnt 8 ; > > > > dcl hash ff ( hashexp: 10, ordered: 1 ) ; > > dcl hiter fi ( 'ff' ) ; > > ff.DefineKey ( 'cnt' ) ; > > ff.DefineData ( 'cnt', 'dos') ; > > ff.DefineDone () ; > > > > do until ( last.id ) ; > > do cnt = 1 by 1 until ( last.dos ) ; > > set xx ; > > by id dos notsorted ; > > end ; > > ff.replace () ; > > end ; > > > > fi.last () ; > > > > max_dos = dos ; > > max_cnt = cnt ; > >run ; > > > >Sorry for the post length and certain things not everyone > can test yet > >for the lack of V9. But at least I have tested them on my > machine, and > >they work as planned. > > > >Kind regards, > >----------------------- > >Paul M. Dorfman > >Jacksonville, FL > >----------------------- > > > > > >>From: "Huang, Ya" <yhuang@AMYLIN.COM> > >> > >>Thanks Paul, > >> > >>I thought about using temporary array, but gave up > >>because dos's value can be anything, decimal or integer, > >>it could be very hard to transform it to integer > >>so that it can be used as array index. Suppose > >>dos takes 0.001,0.1,1,10,100, to transform them all > >>into a integer, the factor need to be 1000, and the > >>array size would be 100,000, while only 5 cell of > >>this huge array will be really used for frequency > >>counting, it seems very low efficient. > >>Maybe I can use format, but it doesn't seem to worth it. > >> > >>Kind regards, > >> > >>Ya > >> > >> > >>-----Original Message----- > >>From: Paul Dorfman [mailto:paul_dorfman@HOTMAIL.COM] > >>Sent: Wednesday, June 04, 2003 10:01 PM > >>To: SAS-L@LISTSERV.UGA.EDU > >>Subject: Re: One pass to find max and maxcnt using data step? > >> > >> > >>Ya, > >> > >>Man, you must be pulling our sas-leg: > >> > >>%let factor = 1000 ; > >> > >>data squeezed ( keep = id max_dos max_cnt ) ; > >> array grp ( 0 : &factor ) _temporary_ ; > >> > >> do _n_ = lbound (grp) to hbound (grp) ; > >> grp (_n_) = . ; > >> end ; > >> > >> do until ( last.id ) ; > >> set xx ; > >> by id ; > >> grp (dos * &factor) ++ 1 ; > >> end ; > >> > >> do _n_ = hbound (grp) by -1 until ( grp(_n_) ) ; > >> end ; > >> > >> max_dos = _n_ / &factor ; > >> max_cnt = grp(_n_) ; > >>run ; > >> > >>Kind regards, > >>--------------------------------- > >>Paul M. Dorfman > >>Jacksonville, FL > >>--------------------------------- > >> > >> > >> > >> >From: "Huang, Ya" <yhuang@AMYLIN.COM> > >> >Reply-To: "Huang, Ya" <yhuang@AMYLIN.COM> > >> >To: SAS-L@LISTSERV.UGA.EDU > >> >Subject: One pass to find max and maxcnt using data step? > >> >Date: Wed, 4 Jun 2003 15:47:00 -0700 > >> > > >> >Hi there, > >> > > >> >data xx; > >> >input id dos; > >> >cards; > >> >1 0.1 > >> >1 0.1 > >> >1 0.01 > >> >1 0.01 > >> >1 0.01 > >> >1 0.01 > >> >1 0.3 > >> >1 0.3 > >> >1 0.3 > >> >2 0.2 > >> >2 0.2 > >> >2 0.6 > >> >2 0.05 > >> >2 0.05 > >> >2 0.05 > >> >2 0.05 > >> >2 0.6 > >> >2 0.6 > >> >2 0.6 > >> >2 0.6 > >> >2 0.6 > >> >; > >> > > >> >For the above data, I need to find, for each id, the max > of dos, and > >> >the number of dos at the max level, so the expected result is as > >> >below: > >> > > >> >id max_dos maxcnt > >> >1 0.3 3 > >> >2 0.6 6 > >> > > >> >It is very easy to get the max_dos, I just wonder if it > is possible > >> >to get the count of max_dos in the same step? > >> > > >> >Any comments? > >> > > >> >Thanks > >> > > >> >Ya > >> > > > > ____________________________________________________________ > Get advanced SPAM filtering on Webmail or POP Mail ... Get > Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005 >


Back to: Top of message | Previous page | Main SAS-L page