Date: Fri, 4 Jun 1999 15:01:42 -0400
Reply-To: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Subject: Re: Copying One Observation within a BY Group
Content-Type: text/plain; charset=koi8-r
Robert Virgile <virgile@MEDIAONE.NET> wrote:
>The original question came from Stuart Wugalter, with an answer by Paul
>Dorfman. Here is an easy approach:
>
>data repeated;
>merge old (drop=var3 var4)
> old (drop=x2 where=(var3 > . or var4 > .));
>by x1;
>
>The intent is to take the one record per X1 that has data for x3 and x4,
>and copy those values to all matching X1 values.
Robert,
This is an impressively short solution, but it will not work. (The original
question and my solution can be found below the sig line.) Suppose the test
dataset is the same I have used, i.e.
DATA A;
INPUT X1 X2 $ X3 X4;
CARDS;
1 X 5 8
1 Y . .
1 Z . .
2 A . 9
2 B 7 .
2 C . .
2 D . .
4 P . 6
4 Q . .
4 R 3 .
7 I . .
7 J . .
9 G . .
9 H 4 .
;
Assuming that you really meant x3 and x4 rather than var3 and var4, your code in
the form
DATA R;
MERGE A (DROP=X3 X4)
A (DROP=X2 WHERE=(X3 > . OR X4 > .));
BY X1;
will produce the following output:
OBS X1 X2 X3 X4
1 1 X 5 8
2 1 Y 5 8
3 1 Z 5 8
4 2 A . 9
5 2 B 7 .
6 2 C 7 .
7 2 D 7 .
8 4 P . 6
9 4 Q 3 .
10 4 R 3 .
11 7 I . .
12 7 J . .
13 9 G 4 .
14 9 H 4 .
Note that it does not achieve the goal with respect to x3 and alters the values
for x4. Besides, the nature of the task should not involve testing of x4 at all
- it is just a satellite and must stay where it was. We can probably do better
by eliminating x4 > . altogether, but, first, it will do more than asked, namely
fill ALL the x3 values in a by-group with that group's nonmissing value, and,
second, invariably screw up x4 (and whatever else satellites there might be):
OBS X1 X2 X3 X4
1 1 X 5 8
2 1 Y 5 8
3 1 Z 5 8
4 2 A 7 .
5 2 B 7 .
6 2 C 7 .
7 2 D 7 .
8 4 P 3 .
9 4 Q 3 .
10 4 R 3 .
11 7 I . .
12 7 J . .
13 9 G 4 .
14 9 H 4 .
Of course, not-first x3 values could be set to missing with an extra IF.
Nevertheless, it appears to me that even though POINT= (or array) logic is more
wordy and may look more complex on the surface, it is, in fact, more flexible
(one can choose between COPY and MOVE types), and it is more straightforward
algorithmically.
Kind regards,
================
Paul M. Dorfman
Jacksonville, FL
================
=======================================================================
Stuart,
One way of doing it is to use 'look-ahead' capabilities of direct access:
DATA A;
INPUT X1 X2 $ X3 X4;
CARDS;
1 X 5 8
1 Y . .
1 Z . .
2 A . 9
2 B 7 .
2 C . .
2 D . .
4 P . 6
4 Q . .
4 R 3 .
7 I . .
7 J . .
9 G . .
9 H 4 .
;
RUN;
DATA B (DROP=X);
SET A NOBS=NOBS;
BY X1;
IF FIRST.X1 THEN DO PTR=_N_+1 TO NOBS;
IF X3 > . OR PTR > NOBS THEN LEAVE;
SET A (KEEP=X1 X3 RENAME=(X1=X)) POINT=PTR;
IF X NE X1 THEN LEAVE;
END;
ELSE X3 = .;
RUN;
Here is the printout of the dataset B:
OBS X1 X2 X3 X4
1 1 X 5 8
2 1 Y . .
3 1 Z . .
4 2 A 7 9
5 2 B . .
6 2 C . .
7 2 D . .
8 4 P 3 6
9 4 Q . .
10 4 R . .
11 7 I . .
12 7 J . .
13 9 G 4 .
14 9 H . .
I assumed that you actually wanted to MOVE the non-missing value to the first
record in the BY-group. However, if you meant COPY literally, then remove the
instruction ELSE X = . from the code. If your file is really large, and
performance is at stake more than the brevity, _temporary_ arrays should be used
instead to implement the logic of 'postponed output', rather than 'look-ahead'.
Kind regards,
=================
Paul M. Dorfman
Jacksonville, FL
=================
"Stuart E. Wugalter" wrote:
>
> I swear someone asked this about 6 months ago, but I'll be @#$%'ed if I
> can find it in the newsgroup archive. Below is my sample dataset. There
> are four variables, say x1-x4, where x1 identifies an observation. Note
> that variables x3 & x4 are missing for all but 1 line per observation. I
> want to be able to check each observation to make sure that x3 & x4
> exist in the first line of each observation and, if not, then copy the
> values from one of the other lines. In the example below, observation
> x1=1 is okay, but observation x2=2 needs the values of x3 & x4 copied to
> the first line of observation 2 (i.e. where x1=2 and x2=a).
>
> I know one solution would be to create another dataset with x1, x3 & x4
> (where x3 ne .) and merge back into this dataset. Is there a more
> efficient solution? TIA
>
> Stuart E. Wugalter
> SAS Consultant
>
> 1 x .0001 47
> 1 y . .
> 1 z . .
> 2 a . .
> 2 b .0527 47
> 2 c . .
> 2 d . .
> .
> .
> .
=======================================================================