Date: Tue, 16 Aug 2011 14:06:29 -0400
Reply-To: Jim Groeneveld <jim.1stat@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jim Groeneveld <jim.1stat@YAHOO.COM>
Subject: Re: Timing of lag() function
Hi Ted,
In addition to what Joe already told you about RETAINing new, non-dataset
variables, I still would like to stress that you should NEVER use the LAG
function inside conditionally executed code, as it takes values from a
stack, not necessarily from the immediately preceeding record if
conditionally executed. So in that case you should always unconditionally
calculate a LAGged value and store it in some new variable and use that
variable in your conditional code.
Regards - Jim.
--
Jim Groeneveld, Netherlands
Statistician/SAS consultant
http://jim.groeneveld.eu.tf
On Mon, 15 Aug 2011 18:10:44 -0400, Kirby, Ted <ted.kirby@LEWIN.COM> wrote:
>With the following dataset:
>
>
>
>data coverage3;
>
>input individual_id :$8. Eff_Date :date9. end_date :date9. Cust_ID :$9.
>count_index;
>
>format Eff_Date end_date date9.;
>
>datalines;
>
>39030981 01Jan2009 30Apr2009 000192961 1
>
>39030981 01May2009 31May2009 000192961 2
>
>39030981 01Jun2009 30Sep2009 000192961 3
>
>39030981 01Oct2009 31Dec2009 000192961 4
>
>39121557 10Oct2008 30Nov2008 000189496 1
>
>;
>
>run;
>
>
>
>and the following code:
>
>
>
>proc sort data=coverage3; by individual_id Eff_date; run;
>
>/* The data are sorted in the INPUT data, but run the PROC SORT so that
>SAS knows it is sorted and we can use the BY statement below. */
>
>data coverage3_eff;
>
>set coverage3;
>
>by individual_id;
>
>
>
>x = lag(eff_date);
>
>y = lag(end_date);
>
>z = lag(cust_id);
>
>if first.individual_id then new_eff_date = eff_date;
>
>else do;
>
> w = lag(new_eff_date);
>
> if eff_date - y >= 90 then new_eff_date = eff_date;
>
> if eff_date - y < 90 and cust_id ^= z then new_eff_date =
>eff_date;
>
> if eff_date - y < 90 and cust_id = z and count_index <= 2 then
>new_eff_date = x;
>
> if eff_date - y < 90 and cust_id = z and count_index > 2 then
>new_eff_date = w;
>
>end;
>
>format new_eff_date x y w date9.;
>
>run;
>
>
>
>Why is the variable "w" missing for all observations? The
>"new_eff_date" variable was assigned a value with the first run through
>the data statement (with the "if first. Individual_id . . . "
>statement), so I would have thought that subsequent observations would
>have had a value for "w" (especially the 2nd observation).
>
>
>
>This happens even if "w" is defined outside of the conditional IF in the
>same block of code as the variables "x" "y" and "z" are defined.
>
>
>
>If I add a "RETAIN new_eff_date;" statement to the code above then "w"
>has a value for the 3rd and 4th observations, but not the 2nd or 5th
>observation. This is fine for the 5th observation, since it is the
>beginning of the new "individual_id" block within the data. However, I
>want there to be a value for "w" in the 2nd observation.
>
>
>
>In all the variations of the code above, all of the "lag" variables "x"
>"y" and "z" have non-missing values. Only "w" has missing values. How
>can I get the "w" in the 2nd observation to have the value of the
>"new_eff_date" from the first observation?
>
>
>************* IMPORTANT - PLEASE READ ********************
>
>This e-mail, including attachments, may include confidential and/or
proprietary information,
>and may be used only by the person or entity to which it is addressed. If
the reader of this
>e-mail is not the intended recipient or his or her authorized agent, the
reader is hereby
>notified that any dissemination, distribution or copying of this e-mail is
prohibited. If you
>have received this e-mail in error, please notify the sender by replying to
this message
>and delete this e-mail immediately.
>
|