Date: Mon, 7 Jun 1999 09:52:21 -0400
Reply-To: WHITLOI1 <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: WHITLOI1 <WHITLOI1@WESTAT.COM>
Subject: Re[2]: Copying One Observation within a BY Group
Content-Type: text/plain; charset=US-ASCII
Howard, Bob, Paul, and Stuart,
I am not sure what all the fuss is about. Bob's technique with Howard's AND
is correct when X3 and X4 are non-missing for the same observation. If Paul's
assumption that they could be non-missing for different observations is
correct then just use Bob's technique twisted a little harder.
data w ;
merge a ( keep = x1 x2 )
a ( keep = x1 x3 where = ( x3 ^= . ) )
a ( keep = x1 x4 where = ( x4 ^= . ) )
;
by x1 ;
if not first.x1 then
do ;
x3 = . ;
x4 = . ;
end ;
run ;
If you want X3 and X4 to come from different observations, they clearly have
to come from different subsets of A.
By the way, thanks Howard for a great presentation our WesSug meeting last
month.
Ian Whitlock <whitloi1@westat.com>
____________________Reply Separator____________________
Subject: Re: Copying One Observation within a BY Group
Author: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Date: 6/7/1999 8:53 AM
It looks to me like Bob's solution is pretty close. I have changed the "OR" to
"AND" and also added the conditional assignment of missing values which Paul
suggested.
My code
DATA R;
MERGE A (DROP=X3 X4)
A (DROP=X2 WHERE=(X3 > . and X4 > .));
BY X1;
if not first.x1 then do;
x3 = .;
x4 = .;
end;
run;
This reflects my interpretation of the original problem statement. It's usually
a good idea to articulate assumptions about data and to analyze the consequences
if they are violated. Here, the key point point is that the non-missing values
of X3 and X4 which are to be "floated" up to the first observation are found
together in a single observation, and that there is exactly one such observation
in each BY group. If the X3 and X4 values are in different observations (as in
Paul's test data), my code will lose them. If there is more than one pair in a
BY group, the first one will be used and others ignored.
I'm not sure why Paul calls X4 a "satellite". Stuart calls for symmetric
treatment of X3 and X4; Bob's code provides that, as does mine. X2 is the model
for variables to be preserved in their original observations. If there are
others, they can be treated as X2 is treated. Note that this technique can
change the order of variables in the data set (example: if there is an X5,
treated like X2, the PDV sequence will be X1 X2 X5 X3 X4).
> From: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
> Subject: Re: Copying One Observation within a BY Group
>
> Robert Virgile <virgile@MEDIAONE.NET> wrote:
>
> >The original question came from Stuart Wugalter, with an answer by Paul
> >Dorfman. Here is an easy approach:
> >
> >data repeated;
> >merge old (drop=var3 var4)
> > old (drop=x2 where=(var3 > . or var4 > .));
> >by x1;
> >
> >The intent is to take the one record per X1 that has data for x3 and x4,
> >and copy those values to all matching X1 values.
>
> Robert,
>
> This is an impressively short solution, but it will not work. (The original
> question and my solution can be found below the sig line.) Suppose the test
> dataset is the same I have used, i.e.
>
> DATA A;
> INPUT X1 X2 $ X3 X4;
> CARDS;
> 1 X 5 8
> 1 Y . .
> 1 Z . .
> 2 A . 9
> 2 B 7 .
> 2 C . .
> 2 D . .
> 4 P . 6
> 4 Q . .
> 4 R 3 .
> 7 I . .
> 7 J . .
> 9 G . .
> 9 H 4 .
> ;
>
> Assuming that you really meant x3 and x4 rather than var3 and var4, your code
in
> the form
>
> DATA R;
> MERGE A (DROP=X3 X4)
> A (DROP=X2 WHERE=(X3 > . OR X4 > .));
> BY X1;
>
> will produce the following output:
>
> OBS X1 X2 X3 X4
> 1 1 X 5 8
> 2 1 Y 5 8
> 3 1 Z 5 8
> 4 2 A . 9
> 5 2 B 7 .
> 6 2 C 7 .
> 7 2 D 7 .
> 8 4 P . 6
> 9 4 Q 3 .
> 10 4 R 3 .
> 11 7 I . .
> 12 7 J . .
> 13 9 G 4 .
> 14 9 H 4 .
>
> Note that it does not achieve the goal with respect to x3 and alters the
values
> for x4. Besides, the nature of the task should not involve testing of x4 at
all
> - it is just a satellite and must stay where it was. We can probably do better
> by eliminating x4 > . altogether, but, first, it will do more than asked,
namely
> fill ALL the x3 values in a by-group with that group's nonmissing value, and,
> second, invariably screw up x4 (and whatever else satellites there might be):
>
> OBS X1 X2 X3 X4
>
> 1 1 X 5 8
> 2 1 Y 5 8
> 3 1 Z 5 8
> 4 2 A 7 .
> 5 2 B 7 .
> 6 2 C 7 .
> 7 2 D 7 .
> 8 4 P 3 .
> 9 4 Q 3 .
> 10 4 R 3 .
> 11 7 I . .
> 12 7 J . .
> 13 9 G 4 .
> 14 9 H 4 .
>
> Of course, not-first x3 values could be set to missing with an extra IF.
[snip]
> Kind regards,
>
> =================
> Paul M. Dorfman
> Jacksonville, FL
> =================
>
> "Stuart E. Wugalter" wrote:
> >
> > I swear someone asked this about 6 months ago, but I'll be @#$%'ed if I
> > can find it in the newsgroup archive. Below is my sample dataset. There
> > are four variables, say x1-x4, where x1 identifies an observation. Note
> > that variables x3 & x4 are missing for all but 1 line per observation. I
> > want to be able to check each observation to make sure that x3 & x4
> > exist in the first line of each observation and, if not, then copy the
> > values from one of the other lines. In the example below, observation
> > x1=1 is okay, but observation x2=2 needs the values of x3 & x4 copied to
> > the first line of observation 2 (i.e. where x1=2 and x2=a).
> >
> > I know one solution would be to create another dataset with x1, x3 & x4
> > (where x3 ne .) and merge back into this dataset. Is there a more
> > efficient solution? TIA
> >
> > Stuart E. Wugalter
> > SAS Consultant
> >
> > 1 x .0001 47
> > 1 y . .
> > 1 z . .
> > 2 a . .
> > 2 b .0527 47
> > 2 c . .
> > 2 d . .
(signed) Howard_Schreier@ita.doc.gov
|