| Date: | Mon, 8 Sep 2008 13:19:17 -0700 |
| Reply-To: | "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> |
| Subject: | Re: Keep the last observation only. How? |
| In-Reply-To: | A<941871A13165C2418EC144ACB212BDB04E16BD@dshsmxoly1504g.dshs.wa.lcl> |
| Content-Type: | text/plain; charset="us-ascii" |
Hey Dan - As Howard so kindly pointed out, SAS views cut out some
processing at certain costs.
In my experience views are usually not a good way to speed up processing
datasets, not that much can be gained in typical processing tasks. In
the test below your method beats mine in CPU by far, but wall clock was
similar, and both yours and mine are much slower than Howard's.
data start(drop=_:);
do _i=1 to 1e6;
fundid=int(ranuni(345)*1000);
date=int(ranuni(123)*19000);
output;
end;
format date date7.;
run;
proc sort;
by fundid date;
run;
/* real time 2.54 seconds*/
/* cpu time 1.87 seconds*/
data wanted (drop=month);
set start (rename=(date=month));
by fundid month groupformat;
date=month;
if last.month;
format month monyy. date yymmdd.;
run;
/* real time 5.22 seconds*/
/* cpu time 5.19 seconds*/
data startmo / view=startmo;
set start;
month = put(date, yymmn6.);
run;
data wanted(drop=month);
set startmo;
by fundid month;
if last.month;
run;
/* real time 5.53 seconds*/
/* cpu time 2.68 seconds*/
data wanted;
set start;
year = year(date);
month = month(date);
run;
Data Wanted;
set wanted;
by fundid year month;
if last.month ;
run;
I think yours might be sped up in the I/O department some by only
creating one six byte character variable instead of two numeric vars and
dropping the month dummy var.
/* real time 3.71 seconds*/
/* cpu time 2.40 seconds*/
data wanted;
set start;
month = put(date,yymmn6.);
run;
Data Wanted(drop=month);
set wanted;
by fundid month;
if last.month ;
run;
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Nordlund, Dan (DSHS/RDA)
Sent: Monday, September 08, 2008 10:52 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Keep the last observation only. How?
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Choate, Paul@DDS
> Sent: Monday, September 08, 2008 10:00 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Keep the last observation only. How?
>
> Howard - your method gets my vote - nice datastep trick! Thanks.
>
> A similar method is to use a view, in the end it passes over the data
> twice so the I/O is higher than yours, but it doesn't change the date
> format.
Paul,
I may need to be enlightened, but I think in your example below there is
only one pass over the data. What am I missing?
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
>
> data startmo / view=startmo;
> set start;
> month = put(date,yymm.);
> run;
>
> /*NOTE: DATA STEP view saved on file WORK.STARTMO.*/
>
> data wanted(drop=month);
> set startmo;
> by fundid month;
> if last.month;
> run;
>
> /*NOTE: There were 14 observations read from the data set
> WORK.START.*/
> /*NOTE: There were 14 observations read from the data set
> WORK.STARTMO.*/
> /*NOTE: The data set WORK.WANTED has 8 observations and 3 variables.*/
>
> Paul Choate
> DDS Data Extraction
> (916) 654-2160
|