LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 26 Apr 2007 14:42:03 -0400
Reply-To:     Chang Chung <chang_y_chung@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Chang Chung <chang_y_chung@HOTMAIL.COM>
Subject:      Re: How to do a reverse filling (missing observations)?
Comments: To: Toby Dunn <tobydunn@HOTMAIL.COM>

On Thu, 26 Apr 2007 17:37:02 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote:

>Hmmm that seems to me like processing the data three time which is atleast 1 >times too many. > >>From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM> >>A small change of Barry's code and the use of double do-loops gives a clean >>output. The FIRST do-loop carries the YEAR from the last record of the >>group(cnt) and enters into the SECOND do-loop and propagates the YEAR for >>each record of the group(cnt). Again the control goes to top of the data >>step to process the records of the next group and so on. >> >>Muthia Kachirayan >> >>data a; >> infile datalines; >> input @1 prodid 2. @4 custid 2. @7 year 4.; >> retain cnt 1; >> output; >> if year > . then cnt + 1; >>datalines; >>1 1 >>2 1 >>3 1 1998 >>2 1 >>3 1 >>4 1 1999 >>4 2 >>5 2 >>6 2 1998 >>; >>run; >> >>data want(drop = cnt); >> do until (last.cnt); >> set a; >> by cnt; >> end; >> do until (last.cnt); >> set a(drop = year); >> by cnt; >> output; >> end; >>run;

Hi,

Muthia's code is cute. I like the clean look of the second data step, which utilizes a variant of the "Double DoW" technique. The classic "Double Dow" usually calculates some kinds of by-group summary stats in the first DoW loop and then attaches the stats to each and every observation in the by-group using the second DoW loop.

Here we don't need to calculate the summary stats since the "summary stats" are just the last (within the by-group) observation values! Now, you might say, "Wait a minute! We only want to attach/replace the year value and nothing else!" Well, don't worry. The second set statement over-writes everything with the original values, except the year variable because it is dropped in the second loop's set statement option. Overall it is quite elegant piece of code!

Practically though, I am bothered by the fact that the input dataset does not have a record id -- especially since it is obvious that the order of observations means something. I would rather create a record id first, unless it takes too much in terms of either/both time and storage.

If the dataset is small enough, then we cannot do better than just run the whole data step backward only once updating in-place. Well, I don't find myself updating a dataset in-place often -- it is too risky. So at least to me, this remains as an exercise -- it is good to know that it is possible, just in case I need it someday. Cheers, Chang

data a; infile datalines missover; input @1 prodid 2. @4 custid 2. @7 year 4.; datalines; 1 1 2 1 3 1 1998 2 1 3 1 4 1 1999 4 2 5 2 6 2 1998 ; run;

data a; do i=nobs to 1 by -1; modify a nobs=nobs point=i; if missing(year) then do; year = sYear; replace; end; else sYear = year; end; stop; run;

proc print data=a; run; /* on lst Obs prodid custid year

1 1 1 1998 2 2 1 1998 3 3 1 1998 4 2 1 1999 5 3 1 1999 6 4 1 1999 7 4 2 1998 8 5 2 1998 9 6 2 1998 */


Back to: Top of message | Previous page | Main SAS-L page