|
On May 3, 11:17 am, art...@NETSCAPE.NET (Arthur Tabachneck) wrote:
> Sorry,
>
> I never provided my original test code and, in the interim, discovered the
> answer.
>
> First, the test code was:
>
> data test1;
> input date permno x;
> lx=lag(x);
> put _all_;
> lx=lag(x);
> put _all_;
> datalines;
> . 4 2
> 1 4 .
> 2 4 1
> ;
>
> data test2;
> input date permno x;
> do i=1 to 2;
> lx=lag(x);
> put _all_;
> end;
> datalines;
> . 4 2
> 1 4 .
> 2 4 1
> ;
>
> I found the reason for the observed difference in the documentation,
> namely: "Each occurrence of a LAGn function in a program generates its own
> queue of values."
>
> Art
> -------
> On Sun, 3 May 2009 10:16:54 -0400, Arthur Tabachneck <art...@NETSCAPE.NET>
> wrote:
>
> >List,
>
> >I was going to answer Barry's question by directing him toward Dan's nice
> >response from a couple of years ago, namely:
>
> >http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0511e&L=sas-l&D=0&P=19479
>
> >Yes, of course the problem is that lag is que rather than observation
> >specific.
>
> >However, before posting my response, I wrote a short program in order to
> >show what was happening and realized that I don't understand all of the
> >specifics.
>
> >Can anyone provide an explanation for the following differences:
>
> >145 data test1;
> >146 input date permno x;
> >147 lx=lag(x);
> >148 put _all_;
> >149 lx=lag(x);
> >150 put _all_;
> >151 datalines;
>
> >date=. permno=4 x=2 lx=. _ERROR_=0 _N_=1
> >date=. permno=4 x=2 lx=. _ERROR_=0 _N_=1
> >date=1 permno=4 x=. lx=2 _ERROR_=0 _N_=2
> >date=1 permno=4 x=. lx=2 _ERROR_=0 _N_=2
> >date=2 permno=4 x=1 lx=. _ERROR_=0 _N_=3
> >date=2 permno=4 x=1 lx=. _ERROR_=0 _N_=3
> >NOTE: The data set WORK.TEST has 3 observations and 4 variables.
> >NOTE: DATA statement used (Total process time):
> > real time 0.01 seconds
> > cpu time 0.01 seconds
>
> >155 ;
>
> >156 data test2;
> >157 input date permno x;
> >158 do i=1 to 2;
> >159 lx=lag(x);
> >160 put _all_;
> >161 end;
> >162 datalines;
>
> >date=. permno=4 x=2 i=1 lx=. _ERROR_=0 _N_=1
> >date=. permno=4 x=2 i=2 lx=2 _ERROR_=0 _N_=1
> >date=1 permno=4 x=. i=1 lx=2 _ERROR_=0 _N_=2
> >date=1 permno=4 x=. i=2 lx=. _ERROR_=0 _N_=2
> >date=2 permno=4 x=1 i=1 lx=. _ERROR_=0 _N_=3
> >date=2 permno=4 x=1 i=2 lx=1 _ERROR_=0 _N_=3
> >NOTE: The data set WORK.TEST has 3 observations and 5 variables.
> >NOTE: DATA statement used (Total process time):
> > real time 0.01 seconds
> > cpu time 0.01 seconds
>
> >166 ;
>
> >Art
> >---------
> >On Sat, 2 May 2009 15:35:59 -0700, barry.brian.barr...@GMAIL.COM wrote:
>
> >>I was trying to understand the code by looking at one and then data
> >>set two.
>
> >>The real code is to produce data set three. I am not really sure how
> >>the code is working.
> >>When I look at the output of lx, ly, and lz for data set three, it
> >>doesn't make sense that the all the columns are filled up, since there
> >>are missing values in the corresponding adjacent lx, ly, and lz row
> >>variables. Let me know how this code works. It looks like magic to me,
> >>LOL. I have programmed in SAS for a year. Let me know what's the best
> >>way to learn SAS even more. Are there good reference books, a must
> >>buy. I just used the online SAS reference guide and google.
> >>I know that the SAS Institute publishes Tutorial books as well.
>
> >>Below is the code ( just asking what is the logic, trying to figure it
> >>out for days. There are some things in SAS that just doesn't make
> >>sense how it works LOL but it just does LOL.):
>
> >>/****CODE BEGINS******/
>
> >>libname home '/home/mit/bcubeb3';
> >>data home.one;
> >>input date permno x y z w;
> >>datalines;
> >>. 4 . . . .
> >>1 4 . . . .
> >>2 4 10 3 3990 .
> >>3 4 . . . 3680
> >>4 4 . . . .
> >>5 4 . . . .
> >>6 4 . . . .
> >>7 4 . . . .
> >>8 4 . . . .
> >>9 4 . . . .
> >>10 4 . . . 3680
> >>11 4 . . . .
> >>12 4 . . . .
> >>13 4 . . . .
> >>14 4 . . . .
> >>15 4 . . . .
> >>16 4 . . . 3793
> >>17 4 . . . .
> >>18 4 . . . .
> >>19 4 . . . .
> >>20 5 . . . 3843
> >>21 5 . . . .
> >>22 5 20 2 4000 .
> >>23 5 . . . .
> >>24 5 . . . .
> >>;
> >>run;
>
> >>data home.two;
> >>set home.one;
> >>by permno date;
> >>i=0;
> >>do while(i<1);
> >>lx=lag(x);
> >>ly=lag(y);
> >>lz=lag(z);
> >>lw=lag(w);
> >>if permno=lag(permno) and x=. then x=lx;
> >>if permno=lag(permno) and y=. then y=ly;
> >>if permno=lag(permno) and z=. then z=lz;
> >>if permno=lag(permno) and w=. then w=lw;
> >>i+1;
> >>end;
> >>run;
>
> >>data home.three;
> >>set home.one;
> >>by permno date;
> >>i=0;
> >>do while(i<2);
> >>lx=lag(x);
> >>ly=lag(y);
> >>lz=lag(z);
> >>lw=lag(w);
> >>if permno=lag(permno) and x=. then x=lx;
> >>if permno=lag(permno) and y=. then y=ly;
> >>if permno=lag(permno) and z=. then z=lz;
> >>if permno=lag(permno) and w=. then w=lw;
> >>i+1;
> >>end;
> >>run;
I was reading the link you sent me. From my understanding the lag
function follows a queue specifically FIFO.
Now initially the data as you have it is:
data permno x
. 4 2
1 4 .
2 4 1
then when you do the do i=1 to 2.
It executes the lag function
so for i=1
data permno x lx
. 4 2 .
1 4 . 2
2 4 1 .
3 N/A N/A 1
The 3 is there for illustrations purposes
so when you apply the lag function it shifts up by one as below
for i=2
data permno x lx
. 4 2 2
1 4 . .
2 4 1 1
I am guessing if you do i=1 to 3. you will get the following:
data permno x lx
. 4 2 .
1 4 . 1
2 4 1 .
but instead you get
i=3
data permno x lx
. 4 2 2
1 4 . .
2 4 1 1
I thought it was following a queue which to mean all it means there is
a shift.
How does that apply to my case, where I don't see how the last data
set (THREE) have all observations filled when using the code. I was
thinking about FIFO but it didn't make sense to me still.
For my case in my initial post:
Dataset ONE:
date permno x y z w;
date permno x y z w
. 4 . . . .
1 4 . . . .
2 4 10 3 3990 .
3 4 . . . 3680
4 4 . . . .
5 4 . . . .
6 4 . . . .
7 4 . . . .
8 4 . . . .
9 4 . . . .
10 4 . . . 3680
11 4 . . . .
12 4 . . . .
13 4 . . . .
14 4 . . . .
15 4 . . . .
16 4 . . . .
17 4 . . . .
18 4 . . . .
19 4 . . . .
20 4 . . . .
21 5 . . . .
22 5 20 2 4000 .
23 5 . . . .
24 5 . . . .
Dataset TWO:
date permno x y z w i lx ly lz lw
. 4 . . . . 1 . . . .
1 4 . . . . 1 . . . .
2 4 10 3 3990 . 1 . . . .
3 4 10 3 3990 3680 1 10 3 3990 .
4 4 . . . 3680 1 . . . 3680
5 4 . . . . 1 . . . .
6 4 . . . . 1 . . . .
7 4 . . . . 1 . . . .
8 4 . . . . 1 . . . .
9 4 . . . . 1 . . . .
10 4 . . . 3680 1 . . . .
11 4 . . . 3680 1 . . . 3680
12 4 . . . . 1 . . . .
13 4 . . . . 1 . . . .
14 4 . . . . 1 . . . .
15 4 . . . . 1 . . . .
16 4 . . . . 1 . . . .
17 4 . . . . 1 . . . .
18 4 . . . . 1 . . . .
19 4 . . . . 1 . . . .
20 5 . . . . 1 . . . .
21 5 . . . . 1 . . . .
22 5 20 2 4000 . 1 . . . .
23 5 20 2 4000 . 1 20 2 4000 .
24 5 . . . . 1 . . . .
data set THREE:
date permno x y z w i lx ly lz lw
. 4 . . . . 2 . . . .
1 4 . . . . 2 . . . .
2 4 10 3 3990 . 2 10 3 3990 .
3 4 10 3 3990 3680 2 . . . 3680
4 4 10 3 3990 3680 2 . . . .
5 4 10 3 3990 3680 2 . . . .
6 4 10 3 3990 3680 2 . . . .
7 4 10 3 3990 3680 2 . . . .
8 4 10 3 3990 3680 2 . . . .
9 4 10 3 3990 3680 2 . . . .
10 4 10 3 3990 3680 2 . . . 3680
11 4 10 3 3990 3680 2 . . . .
12 4 10 3 3990 3680 2 . . . .
13 4 10 3 3990 3680 2 . . . .
14 4 10 3 3990 3680 2 . . . .
15 4 10 3 3990 3680 2 . . . .
16 4 10 3 3990 3680 2 . . . .
17 4 10 3 3990 3680 2 . . . .
18 4 10 3 3990 3680 2 . . . .
19 4 10 3 3990 3680 2 . . . .
20 5 . . . . 2 . . . .
21 5 . . . . 2 . . . .
22 5 20 2 4000 . 2 20 2 4000 .
23 5 20 2 4000 . 2 . . . .
24 5 20 2 4000 . 2 . . . .
|