Date: Sat, 28 Feb 1998 14:29:10 -0600
Reply-To: Jack Hamilton <jack_hamilton@HCCOMPARE.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Jack Hamilton <jack_hamilton@HCCOMPARE.COM>
Subject: Re: HELP: Merge bug?
Content-Type: text/plain; charset=US-ASCII
jrentsch@CAPITA.COM wrote:
>This seems like a bug to me, but maybe I just don't understand why
>this works the way it does... Here is a simple example:
>
>data x;
> a=1; b=1; output;
> a=1; b=3; output;
> a=1; b=1; output;
> run;
>
>data y;
> a=1; c=3; output;
> run;
>
>data z;
> merge x y;
> by a;
> if b eq 3 then c=99;
> run;
>
>proc print;
> run;
>
>Here is the output:
>
>OBS A B C
>1 1 1 3
>2 1 3 99
>3 1 1 99
>
>Now why does observation 3 have C=99 ?? Shouldn't it be 3? If this is
>not a bug, I can't see why anyone would want it to work this way.
No. That is the expected value. All variables in any SET, MERGE,
or UPDATE statement are automatically retained - that is, they keep
their values until reset with an assignment statement (or certain
functions) or another observation from a dataset containing that
variable is read in.
In your case, A, B, and C are all set by the first execution of the
MERGE statement. C is not reset by the second execution of the MERGE,
but is reset by the IF statement. In the third execution, C is not
reset by a value from an input dataset, and it is also not reset by
the IF statement. So it keeps the last value it had from the previous
execution, which is 99.
In the second execution, A and B are given new values from X, and
>
>Interestingly, if I change the last assignment line in the "data x"
>step to:
>
> a=2; b=1; output;
>then the variables are handled correctly (to my way of thinking).
In this case, no value is read in from Y until the third observation,
so you get values for C of 3, 99, and missing. The first two values
are the ones you might expect. It's that last missing value that
would have surprised me, given the behavior in the earlier example,
but "The values of the variables in the program data vector are set
to missing when the value of a BY group changes" (LR, p. 312).
I'll take this opportunity to remind long-time SAS users of two
options that can go in the BY statement, GROUPFORMAT and NOTSORTED
(LR, p 311).
GROUPFORMAT tells SAS to use the formatted values of the BY values when
deciding whether the BY group has changed, rather than the default of
using unformatted values. Most procedures use the formatted value by
default, and you can't use GROUPFORMAT in a PROC.
NOTSORTED tells SAS not to check whether the BY values are sorted, but
to use them as is. One time when you might use this is in PROC PRINT,
when you want observations with a particular value of a variable to
be printed together, but you don't want the report lines to be sorted by
that variable.
|