Date: Mon, 9 Jul 2007 12:30:07 -0400 "Howard Schreier " "SAS(r) Discussion" "Howard Schreier " Re: How to assign the 2nd obs value to the 1st obs

On Sun, 8 Jul 2007 13:04:26 -0500, Suhong Tong <sophidt@HOTMAIL.COM> wrote:

>Hi Howard and Muthia, > >First, Thank you for helping me on my data problem. Second, I apologize >that I provided a misleading output. Instead putting the wrong output I >created, I really should have put a result that I want. Below is what I >want: > >For the data presented below, PE=78765 - 78767, 94725 - 94728 are sequential >numbers, thus these two sets are considered as counseling calls. I believe >DIF(PE) only take care of 94726 - 94728, >IFN( first.pr, 1....) can take care of PE=78765, but not something stuck in >the middle(not the first.), like 94725, if I understood what you suggested >correctly.

Your example was not sufficiently general.

> >I am thinking if SAS has some function can do things opposite to LAG >function, meaning move 1 observation up, then my problem can be taken care >of.

There is no opposite-of-lag function; it is in fact not possible.

There are techniques for looking ahead in data, but they tend to be clumsy.

Here is a way which exploits BY groups.

data given; input PR PP PE; cards; 26117 12644 78765 26117 12644 78766 26117 12644 78767 26117 12644 79398 26117 12644 80601 26117 12644 81343 26117 12644 81503 26117 12644 83429 32640 15436 107309 32640 15436 114404 32640 15436 163072 32640 15436 166924 32640 15436 94725 32640 15436 94726 32640 15436 94727 32640 15436 94728 ;

data almost / view=almost; set given; by PR PP; IND_PEDIFF = ifn(first.PP,0,dif(PE)=1); run;

data result; set almost; by PR PP IND_PEDIFF notsorted; if last.IND_PEDIFF and not last.PP then IND_PEDIFF = 1; run;

The first step is not quite right because it misses the first observation in each string of consecutive PE values. The second step fixes that. Bt=y using a view, only one pass through the data is required.

You should build a more extensive test data set to confirm that this will work in various situations.

It's unusual to assign a new value to a BY variable, but I think it's OK here.

> >Thanks, > >Sohpia > >pr pp pe IND_PEDIFF >26117 12644 78765 1 >26117 12644 78766 1 >26117 12644 78767 1 >26117 12644 79398 0 >26117 12644 80601 0 >26117 12644 81343 0 >26117 12644 81503 0 >26117 12644 83429 0 >32640 15436 107309 0 >32640 15436 114404 0 >32640 15436 163072 0 >32640 15436 166924 0 >32640 15436 94725 1 >32640 15436 94726 1 >32640 15436 94727 1 >32640 15436 94728 1 > > > >>From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM> >>Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM> >>To: SAS-L@LISTSERV.UGA.EDU >>Subject: Re: How to assign the 2nd obs value to the 1st obs >>Date: Sat, 7 Jul 2007 22:56:16 -0400 >> >>On Fri, 6 Jul 2007 18:57:17 -0400, Sophia Tong <sophiDT@HOTMAIL.COM> wrote: >> >> >Dear list, >> > >> >I am trying to determine whether a call is a real counseling call or >>random >> >incoming call by looking at the PE numbers. If under the same PR-PP, PE >> >come as a sequential numbers then that set of sequential numbers consider >>as >> >a set of counseling calls. >> > >> >What I did is let PE2=lag(PE), then PE_DIFF=PE-PE2, if PE_DIFF=1 then >>that >> >call identified as a conseling call. As you see below, the first obs with >> >PE_DIFF=-264, this one should be identified as a real call. Another one >> >where PE_DIFF=-72199, that one also should be identified as a real call. >> > >> >My question is how to assign 1s to IND_PEDIFF after SAS see the first 1s >>in >> >PE_DIFF? >> >Mine way of flagging in IND_PEDIFF does not cover all the cases. Please >>help. >> > >> >Thanks in advance. >> > >> >Sophia >> > The SAS System 08:03 >> >Friday, July 6, 2007 30 >> > >> > IND_ >> > PR PP PE PE2 PE_DIFF >>PEDIFF >> > CALLS >> > >> > 26117 12644 78765 79029 -264 1 >> > 1 >> > 26117 12644 78766 78765 1 1 >> > 2 >> > 26117 12644 78767 78766 1 1 >> > 3 >> > 26117 12644 79398 78767 631 0 >> > 4 >> > 26117 12644 80601 79398 1203 0 >> > 5 >> > 26117 12644 81343 80601 742 0 >> > 6 >> > 26117 12644 81503 81343 160 0 >> > 7 >> > 26117 12644 83429 81503 1926 0 >> > 8 >> > 32640 15436 107309 94633 12676 1 >> > 1 >> > 32640 15436 114404 107309 7095 0 >> > 2 >> > 32640 15436 163072 114404 48668 0 >> > 3 >> > 32640 15436 166924 163072 3852 0 >> > 4 >> > 32640 15436 94725 166924 -72199 0 >> > 5 >> > 32640 15436 94726 94725 1 1 >> > 6 >> > 32640 15436 94727 94726 1 1 >> > 7 >> > 32640 15436 94728 94727 1 1 >> > 8 >> >>When presenting a table like this, it helps very much to reduce the >>whitespace in order to avoid wrapping. See >> >>http://sascommunity.org/wiki/Preparing_Sample_Data_for_SAS-L >> >>Now, to the question. Try something like >> >> data result; >> set given; >> by pr pp; >> IND_PEDIFF = ifn ( first.pr, 1, dif ( pe )=1 ); >> run; >> >>Notice that the indicator can be derived in one statement, with no >>intermediate variables. The DIF function combines the roles of the LAG >>function and the subtraction operator. When compared to the value 1, the >>result is either true (1) or false (0). The IFN function overrides that >>result for the first observation in each BY group. > >_________________________________________________________________ >http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_pcmag_0507

Back to: Top of message | Previous page | Main SAS-L page