Date: Mon, 9 Jul 2007 12:30:07 -0400
Subject: Re: How to assign the 2nd obs value to the 1st obs

On Sun, 8 Jul 2007 13:04:26 -0500, Suhong Tong <sophidt@HOTMAIL.COM> wrote:

>Hi Howard and Muthia, > >First, Thank you for helping me on my data problem. Second, I apologize >that I provided a misleading output. Instead putting the wrong output I >created, I really should have put a result that I want. Below is what I >want: > >For the data presented below, PE=78765 - 78767, 94725 - 94728 are sequential >numbers, thus these two sets are considered as counseling calls. I believe >DIF(PE) only take care of 94726 - 94728, >IFN( first.pr, 1....) can take care of PE=78765, but not something stuck in >the middle(not the first.), like 94725, if I understood what you suggested >correctly.

Your example was not sufficiently general.

> >I am thinking if SAS has some function can do things opposite to LAG >function, meaning move 1 observation up, then my problem can be taken care >of.

There is no opposite-of-lag function; it is in fact not possible.

There are techniques for looking ahead in data, but they tend to be clumsy.

Here is a way which exploits BY groups.

data given; input PR PP PE; cards; 26117 12644 78765 26117 12644 78766 26117 12644 78767 26117 12644 79398 26117 12644 80601 26117 12644 81343 26117 12644 81503 26117 12644 83429 32640 15436 107309 32640 15436 114404 32640 15436 163072 32640 15436 166924 32640 15436 94725 32640 15436 94726 32640 15436 94727 32640 15436 94728 ;

data almost / view=almost; set given; by PR PP; IND_PEDIFF = ifn(first.PP,0,dif(PE)=1); run;

data result; set almost; by PR PP IND_PEDIFF notsorted; if last.IND_PEDIFF and not last.PP then IND_PEDIFF = 1; run;

The first step is not quite right because it misses the first observation in each string of consecutive PE values. The second step fixes that. Bt=y using a view, only one pass through the data is required.

You should build a more extensive test data set to confirm that this will work in various situations.

It's unusual to assign a new value to a BY variable, but I think it's OK here.

pr pp pe IND_PEDIFF
26117 12644 78765 1
26117 12644 78766 1
26117 12644 78767 1
26117 12644 79398 0
26117 12644 80601 0
26117 12644 81343 0
26117 12644 81503 0
26117 12644 83429 0
32640 15436 107309 0
32640 15436 114404 0
32640 15436 163072 0
32640 15436 166924 0
32640 15436 94725 1
32640 15436 94726 1
32640 15436 94727 1
32640 15436 94728 1

