LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 8 Sep 2008 13:19:17 -0700
Reply-To:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:   Re: Keep the last observation only. How?
In-Reply-To:   A<941871A13165C2418EC144ACB212BDB04E16BD@dshsmxoly1504g.dshs.wa.lcl>
Content-Type:   text/plain; charset="us-ascii"

Hey Dan - As Howard so kindly pointed out, SAS views cut out some processing at certain costs.

In my experience views are usually not a good way to speed up processing datasets, not that much can be gained in typical processing tasks. In the test below your method beats mine in CPU by far, but wall clock was similar, and both yours and mine are much slower than Howard's.

data start(drop=_:); do _i=1 to 1e6; fundid=int(ranuni(345)*1000); date=int(ranuni(123)*19000); output; end; format date date7.; run;

proc sort; by fundid date; run;

/* real time 2.54 seconds*/ /* cpu time 1.87 seconds*/ data wanted (drop=month); set start (rename=(date=month)); by fundid month groupformat; date=month; if last.month; format month monyy. date yymmdd.; run;

/* real time 5.22 seconds*/ /* cpu time 5.19 seconds*/ data startmo / view=startmo; set start; month = put(date, yymmn6.); run; data wanted(drop=month); set startmo; by fundid month; if last.month; run;

/* real time 5.53 seconds*/ /* cpu time 2.68 seconds*/ data wanted; set start; year = year(date); month = month(date); run; Data Wanted; set wanted; by fundid year month; if last.month ; run;

I think yours might be sped up in the I/O department some by only creating one six byte character variable instead of two numeric vars and dropping the month dummy var.

/* real time 3.71 seconds*/ /* cpu time 2.40 seconds*/ data wanted; set start; month = put(date,yymmn6.); run; Data Wanted(drop=month); set wanted; by fundid month; if last.month ; run;

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nordlund, Dan (DSHS/RDA) Sent: Monday, September 08, 2008 10:52 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Keep the last observation only. How?

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Choate, Paul@DDS > Sent: Monday, September 08, 2008 10:00 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Keep the last observation only. How? > > Howard - your method gets my vote - nice datastep trick! Thanks. > > A similar method is to use a view, in the end it passes over the data > twice so the I/O is higher than yours, but it doesn't change the date > format.

Paul,

I may need to be enlightened, but I think in your example below there is only one pass over the data. What am I missing?

Dan

Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204

> > data startmo / view=startmo; > set start; > month = put(date,yymm.); > run; > > /*NOTE: DATA STEP view saved on file WORK.STARTMO.*/ > > data wanted(drop=month); > set startmo; > by fundid month; > if last.month; > run; > > /*NOTE: There were 14 observations read from the data set > WORK.START.*/ > /*NOTE: There were 14 observations read from the data set > WORK.STARTMO.*/ > /*NOTE: The data set WORK.WANTED has 8 observations and 3 variables.*/ > > Paul Choate > DDS Data Extraction > (916) 654-2160


Back to: Top of message | Previous page | Main SAS-L page