LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 1998, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 28 Feb 1998 14:29:10 -0600
Reply-To:     Jack Hamilton <jack_hamilton@HCCOMPARE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Jack Hamilton <jack_hamilton@HCCOMPARE.COM>
Subject:      Re: HELP: Merge bug?
Comments: To:
Content-Type: text/plain; charset=US-ASCII

jrentsch@CAPITA.COM wrote:

>This seems like a bug to me, but maybe I just don't understand why >this works the way it does... Here is a simple example: > >data x; > a=1; b=1; output; > a=1; b=3; output; > a=1; b=1; output; > run; > >data y; > a=1; c=3; output; > run; > >data z; > merge x y; > by a; > if b eq 3 then c=99; > run; > >proc print; > run; > >Here is the output: > >OBS A B C >1 1 1 3 >2 1 3 99 >3 1 1 99 > >Now why does observation 3 have C=99 ?? Shouldn't it be 3? If this is >not a bug, I can't see why anyone would want it to work this way.

No. That is the expected value. All variables in any SET, MERGE, or UPDATE statement are automatically retained - that is, they keep their values until reset with an assignment statement (or certain functions) or another observation from a dataset containing that variable is read in.

In your case, A, B, and C are all set by the first execution of the MERGE statement. C is not reset by the second execution of the MERGE, but is reset by the IF statement. In the third execution, C is not reset by a value from an input dataset, and it is also not reset by the IF statement. So it keeps the last value it had from the previous execution, which is 99.

In the second execution, A and B are given new values from X, and > >Interestingly, if I change the last assignment line in the "data x" >step to: > > a=2; b=1; output; >then the variables are handled correctly (to my way of thinking).

In this case, no value is read in from Y until the third observation, so you get values for C of 3, 99, and missing. The first two values are the ones you might expect. It's that last missing value that would have surprised me, given the behavior in the earlier example, but "The values of the variables in the program data vector are set to missing when the value of a BY group changes" (LR, p. 312).

I'll take this opportunity to remind long-time SAS users of two options that can go in the BY statement, GROUPFORMAT and NOTSORTED (LR, p 311).

GROUPFORMAT tells SAS to use the formatted values of the BY values when deciding whether the BY group has changed, rather than the default of using unformatted values. Most procedures use the formatted value by default, and you can't use GROUPFORMAT in a PROC.

NOTSORTED tells SAS not to check whether the BY values are sorted, but to use them as is. One time when you might use this is in PROC PRINT, when you want observations with a particular value of a variable to be printed together, but you don't want the report lines to be sorted by that variable.

Back to: Top of message | Previous page | Main SAS-L page