LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 6 Jun 2003 15:01:46 -0500
Reply-To:     pudding_man@lycos.com
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Puddin' Man <pudding_man@LYCOS.COM>
Organization: Lycos Mail  (http://www.mail.lycos.com:80)
Subject:      Re: One pass to find max and maxcnt using data step?
Comments: To: sashole@bellsouth.net
Content-Type: text/plain; charset=us-ascii

I think Paul is aware of the fact that I am a 'fan' of direct-addressing techniques in general and key- indexing in particular ...

The V9 hash array approach is very interesting and instructive. I have no problem with Paul's post.

I think, however, that parsimony is a valid issue here. Ya's request is sooooooooo simply satisfied with a couple scalar vars as to make direct-addressing kinda sorta 'overkill', somewhat akin to writing macros where none are needed.

I would be inclined toward something like:

data yy(drop = dos); max_dos = .; maxcnt = 0;

do until (last.id); set xx; by id; if dos < max_dos then continue; maxcnt = 1 + (max_dos = dos) * maxcnt; max_dos = dos; end;

run;

Very simple code which, I think, would run fine with V6-V9 and perhaps burn a few less cpu cycles.

Prost, Puddin'

*********************************************************** *** Puddin' Man *** Pudding_Man@lycos.com ******** ***********************************************************;

"If I don't go crazy I will surely lose my mind." -from "Mean Black Spider Blues", Robt Lockwood Jr, around 1940.

--------- Original Message ---------

DATE: Thu, 5 Jun 2003 17:53:05 From: Paul Dorfman <paul_dorfman@HOTMAIL.COM> To: SAS-L@LISTSERV.UGA.EDU Cc:

>Oh man, > >You must be pulling my sas-leg once again. You know perfectly well that when >key-indexing becomes a drag because of the key range, you should use a hash >table. Of course, with hand-coded hashing, *some* assumption has to be made. >For example, I am taking the liberty to assume that the maximum size of any >ID-group does not exceed 100 (is it reasonable? Change that if need be), and >so size the linearly-probed hash table at 203 (lowest prime greater than >100*2). The factor 1e13 is selected just to cover all possible fractional >territory. If it by chance does not, and some eksy after the multiplication >remain fractional, it will not affect the algorithm: > >%let h = 203 ; > >data squeezed ( keep = id max_dos max_cnt ) ; > array h ( 0 : &h ) _temporary_ ; > array f ( 0 : &h ) _temporary_ ; > > do until ( last.id ) ; > set xx ; > by id ; > do j = mod (dos * 1e13, &h) by 1 until ( h(j) = dos ) ; > if j = &h then j = 0 ; > if h(j) = . then h(j) = dos ; > end ; > f(j) ++ 1 ; > end ; > > do j = 0 to &h - 1 ; > if h(j) > max_dos then do ; > max_dos = h(j) ; > max_cnt = f(j) ; > end ; > h(j) = . ; > f(j) = . ; > end ; >run ; > >Now isit not curious that I concocted a similar, albeit somewhat more >involved, problem for our Seattle presentation of "Hashing: Generations" to >show how in V9, the hash iterator object ('hi' below) could be used to >extract a satellite with the largest hash table key? In terms of the present >problem, it would look like this: > >data squeezed ( keep = id max_dos max_cnt ) ; > length dos cnt 8 ; > > dcl hash hh ( hashexp: 10, ordered: 1 ) ; > dcl hiter hi ( 'hh' ) ; > hh.DefineKey ( 'dos' ) ; > hh.DefineData ( 'dos', 'cnt') ; > hh.DefineDone () ; > > do until ( last.id ) ; > set xx ; > by id ; > rc = hh.find () ; > if rc = 0 then cnt ++ 1 ; > else cnt = 1 ; > rc = hh.replace () ; > end ; > > rc = hi.last () ; > > max_dos = dos ; > max_cnt = cnt ; >run ; > >(By the same token, the hi.first() method would extract the frequency with >the minimum DOS). Interestingly, the same approach can be used if the >problem were inverted, and, instead of finding the frequency corresponding >to the largest DOS in the group, you needed to obtain the DOS corresponding >to the largest frequency. The only principal change we would need in the >latter case would be to key the table by CNT instead of by DOS, which thus >becomes the satellite: > >data squeezed ( keep = id max_dos max_cnt ) ; > length dos cnt 8 ; > > dcl hash ff ( hashexp: 10, ordered: 1 ) ; > dcl hiter fi ( 'ff' ) ; > ff.DefineKey ( 'cnt' ) ; > ff.DefineData ( 'cnt', 'dos') ; > ff.DefineDone () ; > > do until ( last.id ) ; > do cnt = 1 by 1 until ( last.dos ) ; > set xx ; > by id dos notsorted ; > end ; > ff.replace () ; > end ; > > fi.last () ; > > max_dos = dos ; > max_cnt = cnt ; >run ; > >Sorry for the post length and certain things not everyone can test yet for >the lack of V9. But at least I have tested them on my machine, and they work >as planned. > >Kind regards, >----------------------- >Paul M. Dorfman >Jacksonville, FL >----------------------- > > >>From: "Huang, Ya" <yhuang@AMYLIN.COM> >> >>Thanks Paul, >> >>I thought about using temporary array, but gave up >>because dos's value can be anything, decimal or integer, >>it could be very hard to transform it to integer >>so that it can be used as array index. Suppose >>dos takes 0.001,0.1,1,10,100, to transform them all >>into a integer, the factor need to be 1000, and the >>array size would be 100,000, while only 5 cell of >>this huge array will be really used for frequency >>counting, it seems very low efficient. >>Maybe I can use format, but it doesn't seem to worth it. >> >>Kind regards, >> >>Ya >> >> >>-----Original Message----- >>From: Paul Dorfman [mailto:paul_dorfman@HOTMAIL.COM] >>Sent: Wednesday, June 04, 2003 10:01 PM >>To: SAS-L@LISTSERV.UGA.EDU >>Subject: Re: One pass to find max and maxcnt using data step? >> >> >>Ya, >> >>Man, you must be pulling our sas-leg: >> >>%let factor = 1000 ; >> >>data squeezed ( keep = id max_dos max_cnt ) ; >> array grp ( 0 : &factor ) _temporary_ ; >> >> do _n_ = lbound (grp) to hbound (grp) ; >> grp (_n_) = . ; >> end ; >> >> do until ( last.id ) ; >> set xx ; >> by id ; >> grp (dos * &factor) ++ 1 ; >> end ; >> >> do _n_ = hbound (grp) by -1 until ( grp(_n_) ) ; >> end ; >> >> max_dos = _n_ / &factor ; >> max_cnt = grp(_n_) ; >>run ; >> >>Kind regards, >>--------------------------------- >>Paul M. Dorfman >>Jacksonville, FL >>--------------------------------- >> >> >> >> >From: "Huang, Ya" <yhuang@AMYLIN.COM> >> >Reply-To: "Huang, Ya" <yhuang@AMYLIN.COM> >> >To: SAS-L@LISTSERV.UGA.EDU >> >Subject: One pass to find max and maxcnt using data step? >> >Date: Wed, 4 Jun 2003 15:47:00 -0700 >> > >> >Hi there, >> > >> >data xx; >> >input id dos; >> >cards; >> >1 0.1 >> >1 0.1 >> >1 0.01 >> >1 0.01 >> >1 0.01 >> >1 0.01 >> >1 0.3 >> >1 0.3 >> >1 0.3 >> >2 0.2 >> >2 0.2 >> >2 0.6 >> >2 0.05 >> >2 0.05 >> >2 0.05 >> >2 0.05 >> >2 0.6 >> >2 0.6 >> >2 0.6 >> >2 0.6 >> >2 0.6 >> >; >> > >> >For the above data, I need to find, for each id, the max of dos, >> >and the number of dos at the max level, so the expected result >> >is as below: >> > >> >id max_dos maxcnt >> >1 0.3 3 >> >2 0.6 6 >> > >> >It is very easy to get the max_dos, I just wonder if it is >> >possible to get the count of max_dos in the same step? >> > >> >Any comments? >> > >> >Thanks >> > >> >Ya >>

____________________________________________________________ Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005


Back to: Top of message | Previous page | Main SAS-L page