Date: Wed, 3 Feb 2010 11:57:14 -0600
Reply-To: "Feyerharm, Robert W." <RobertF@HEALTH.OK.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Feyerharm, Robert W." <RobertF@HEALTH.OK.GOV>
Subject: Re: lag calculation macro
In-Reply-To: <b7a7fa631002030920k1d5e8a50vd79652a746f61887@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
Thanks Joe! It's tempting to use macros or arrays in SAS when a simpler
code works far more efficiently. Too bad there isn't a SAS guide that
covers common pitfalls in SAS programming such as this.
From: Joe Matise [mailto:snoopy369@gmail.com]
Sent: Wednesday, February 03, 2010 11:21 AM
To: Feyerharm, Robert W.
Cc: SAS-L@listserv.uga.edu
Subject: Re: lag calculation macro
Art's suggestion is the right one IMO for your specific problem.
As a general critique, most of the time when a "utility" macro contains
a full datastep, it's the wrong answer. Utility macros should generally
be less than one datastep; ie, called inside a datastep. Otherwise
you're just using the macro to do data step looping, which is much more
cheaply handled with data step functions and statements.
For example, you could do
data want;
set test;
n=_n_;
if missing(x) then do;
do until (not missing(x));
n=n-1;
set test(keep=x) point=n;
end;
end;
run;
which is nearly identical to your macro in its approach, but uses data
step looping instead of the macro, and will be substantially more
efficient.
A utility macro to accomplish this would be something like this:
%macro rollback(var);
%do i = 1 %to 8;
&var=ifn(&var,&var,0,lag&i(&var));
%end;
%mend rollback;
data want;
set test;
%rollback(x);
run;
Note how it is called from inside a data step. I don't recommend this
particular solution as there are better ways [in particular, this
executes many statements needlessly and has to be hardcoded with the
maximum possible number of lag values] but it illustrates my point.
-Joe
On Wed, Feb 3, 2010 at 10:47 AM, Robert Feyerharm
<robertf@health.ok.gov> wrote:
Has anyone else struggled with the lag function in SAS? The lag function
often doesn't work in an intuitive manner in conditional statements, or
if
missing values are involved.
Anyways, I wrote a short macro code to handle lag calculations which I'd
like to share. It works fairly well, although I may be reinventing the
wheel. A colleague was working with a large dataset containing missing
values, and wanted to impute the last known value into the missing
values
for a particular field. For example, given the following test dataset,
the
macro fills in the missing values of x with the last known value of x
(1,2, or 3):
data test;
input x y;
datalines;
1 1
. 2
. 3
. 4
. 5
2 6
. 7
. 8
3 9
. 10
. 11
. 12
. 13
. 14
;
run;
/* Let num = no. of records in target dataset. */
%let num=14;
%macro lagvar(num);
%do i = 1 %to #
data test;
set test;
z=lag(x);
if x=. then x=z;
run;
%end;
data test;
set test;
drop z;
run;
%mend;
%lagvar(&num)
I'm curious if there is a shorter program that can handle the same
operations without resorting to a macro?
Robert Feyerharm
Oklahoma State Department of Health