LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 8 Jul 2004 12:04:12 -0400
Reply-To:     Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject:      Re: selecting smallest rec

Nicole did not provide any guidance regarding the possibility of ties.

If there are ties, the SQL code will return all of the rows in question, whereas the DATA step will return just one (the first or last encountered, depending on whether the IF condition is a strict inequality).

Another layer of SQL can be applied to break ties. Something like:

create table just1 as select * from (select * from a having min(col1) = col1) having name = max(name);

This assumes that NAME values are unique.

Revamping the DATA step code to yield muliple observations in the case of ties would be more intricate.

On Thu, 8 Jul 2004 00:34:03 -0400, Paul M. Dorfman <sashole@BELLSOUTH.NET> wrote:

>Nicole, > >You have got sage advices regarding SQL, but beware that when you do this in >the Kevin-corrected form, > >having col1=min(col1), > >SQL will get busy remerging the statistic it has computed with the original >file, even though the output is to contain a single record. The SQL >optimizer here is, well... not optimized enough so far. Instead of (1) >merrily going record by record computing MIN, memorizing the respective >satellite variables in the record, and finally outputting the corresponding >values at end, SQL (2) goes through the file once, computes the statistic, >then does the second pass to match the minimal value with the original file. >SUMMARY will run 20 times faster, but it cannot grab the satellites right >with the ID statement. Another choice, the Data step, is much faster still, >as evidenced from > >21 data a ; >22 do name = 1 to 1e7 ; >23 col1 = ceil (ranuni (1) * 1e10) ; >24 output ; >25 end ; >26 run ; >NOTE: The data set WORK.A has 10000000 observations and 2 variables. >NOTE: DATA statement used (Total process time): > real time 6.98 seconds >27 >28 proc sql ; >29 create table minsql as select * from a having min(col1) = col1 ; >NOTE: The query requires remerging summary statistics back with the original >data. >NOTE: Table WORK.MINSQL created, with 1 rows and 2 columns. > >30 quit ; >NOTE: PROCEDURE SQL used (Total process time): > real time 12.45 seconds >31 >32 data min_ds ( drop = _: ) ; >33 col1 = constant ('big') ; >34 do until (eof) ; >35 set a (rename = (col1 = _col1 name = _name)) end = eof ; >36 if _col1 => col1 then continue ; >37 col1 = _col1 ; >38 name = _name ; >39 end ; >40 run ; >NOTE: There were 10000000 observations read from the data set WORK.A. >NOTE: The data set WORK.MIN_DS has 1 observations and 2 variables. >NOTE: DATA statement used (Total process time): > real time 2.93 seconds > cpu time 2.92 seconds > >Of course, if you have more than one satellite (NAME in this case), you will >have to code more in RENAME and re-assignment statements, even though it can >be easily automated. SQL saves that part, so if you do not really care about >performance, SQL is the way to go. > >Kind regards, >---------------- >Paul M. Dorfman >Jacksonville, FL >---------------- > > > > > >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On >> Behalf Of Nicole Bibb >> Sent: Wednesday, July 07, 2004 8:23 PM >> To: SAS-L@LISTSERV.UGA.EDU >> Subject: selecting smallest rec >> >> I am trying to select a record based on the smallest value of >> a variable. >> >> >> proc sql errorstop; >> create table goodscr as >> select * >> from champion2 >> having min(col1); >> quit; >> >> The data looks like >> >> name col1 >> green 37 >> blue 125 >> brown 325 >> >> I want to keep only the record with the smallest value from col1. >> >> The sql is ignoring my having clause. >>


Back to: Top of message | Previous page | Main SAS-L page