Date: Mon, 7 Jun 1999 08:53:33 -0400
Reply-To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject: Re: Copying One Observation within a BY Group
Content-Type: text/plain; charset=US-ASCII
It looks to me like Bob's solution is pretty close. I have changed the "OR" to "AND" and also added the conditional assignment of missing values which Paul suggested.
My code
DATA R;
MERGE A (DROP=X3 X4)
A (DROP=X2 WHERE=(X3 > . and X4 > .));
BY X1;
if not first.x1 then do;
x3 = .;
x4 = .;
end;
run;
This reflects my interpretation of the original problem statement. It's usually a good idea to articulate assumptions about data and to analyze the consequences if they are violated. Here, the key point point is that the non-missing values of X3 and X4 which are to be "floated" up to the first observation are found together in a single observation, and that there is exactly one such observation in each BY group. If the X3 and X4 values are in different observations (as in Paul's test data), my code will lose them. If there is more than one pair in a BY group, the first one will be used and others ignored.
I'm not sure why Paul calls X4 a "satellite". Stuart calls for symmetric treatment of X3 and X4; Bob's code provides that, as does mine. X2 is the model for variables to be preserved in their original observations. If there are others, they can be treated as X2 is treated. Note that this technique can change the order of variables in the data set (example: if there is an X5, treated like X2, the PDV sequence will be X1 X2 X5 X3 X4).
> From: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
> Subject: Re: Copying One Observation within a BY Group
>
> Robert Virgile <virgile@MEDIAONE.NET> wrote:
>
> >The original question came from Stuart Wugalter, with an answer by Paul
> >Dorfman. Here is an easy approach:
> >
> >data repeated;
> >merge old (drop=var3 var4)
> > old (drop=x2 where=(var3 > . or var4 > .));
> >by x1;
> >
> >The intent is to take the one record per X1 that has data for x3 and x4,
> >and copy those values to all matching X1 values.
>
> Robert,
>
> This is an impressively short solution, but it will not work. (The original
> question and my solution can be found below the sig line.) Suppose the test
> dataset is the same I have used, i.e.
>
> DATA A;
> INPUT X1 X2 $ X3 X4;
> CARDS;
> 1 X 5 8
> 1 Y . .
> 1 Z . .
> 2 A . 9
> 2 B 7 .
> 2 C . .
> 2 D . .
> 4 P . 6
> 4 Q . .
> 4 R 3 .
> 7 I . .
> 7 J . .
> 9 G . .
> 9 H 4 .
> ;
>
> Assuming that you really meant x3 and x4 rather than var3 and var4, your code in
> the form
>
> DATA R;
> MERGE A (DROP=X3 X4)
> A (DROP=X2 WHERE=(X3 > . OR X4 > .));
> BY X1;
>
> will produce the following output:
>
> OBS X1 X2 X3 X4
> 1 1 X 5 8
> 2 1 Y 5 8
> 3 1 Z 5 8
> 4 2 A . 9
> 5 2 B 7 .
> 6 2 C 7 .
> 7 2 D 7 .
> 8 4 P . 6
> 9 4 Q 3 .
> 10 4 R 3 .
> 11 7 I . .
> 12 7 J . .
> 13 9 G 4 .
> 14 9 H 4 .
>
> Note that it does not achieve the goal with respect to x3 and alters the values
> for x4. Besides, the nature of the task should not involve testing of x4 at all
> - it is just a satellite and must stay where it was. We can probably do better
> by eliminating x4 > . altogether, but, first, it will do more than asked, namely
> fill ALL the x3 values in a by-group with that group's nonmissing value, and,
> second, invariably screw up x4 (and whatever else satellites there might be):
>
> OBS X1 X2 X3 X4
>
> 1 1 X 5 8
> 2 1 Y 5 8
> 3 1 Z 5 8
> 4 2 A 7 .
> 5 2 B 7 .
> 6 2 C 7 .
> 7 2 D 7 .
> 8 4 P 3 .
> 9 4 Q 3 .
> 10 4 R 3 .
> 11 7 I . .
> 12 7 J . .
> 13 9 G 4 .
> 14 9 H 4 .
>
> Of course, not-first x3 values could be set to missing with an extra IF.
[snip]
> Kind regards,
>
> =================
> Paul M. Dorfman
> Jacksonville, FL
> =================
>
> "Stuart E. Wugalter" wrote:
> >
> > I swear someone asked this about 6 months ago, but I'll be @#$%'ed if I
> > can find it in the newsgroup archive. Below is my sample dataset. There
> > are four variables, say x1-x4, where x1 identifies an observation. Note
> > that variables x3 & x4 are missing for all but 1 line per observation. I
> > want to be able to check each observation to make sure that x3 & x4
> > exist in the first line of each observation and, if not, then copy the
> > values from one of the other lines. In the example below, observation
> > x1=1 is okay, but observation x2=2 needs the values of x3 & x4 copied to
> > the first line of observation 2 (i.e. where x1=2 and x2=a).
> >
> > I know one solution would be to create another dataset with x1, x3 & x4
> > (where x3 ne .) and merge back into this dataset. Is there a more
> > efficient solution? TIA
> >
> > Stuart E. Wugalter
> > SAS Consultant
> >
> > 1 x .0001 47
> > 1 y . .
> > 1 z . .
> > 2 a . .
> > 2 b .0527 47
> > 2 c . .
> > 2 d . .
(signed) Howard_Schreier@ita.doc.gov
|