Date: Sat, 26 Apr 2003 23:05:19 GMT
Reply-To: Arthur Tabachneck <atabachneck@ROGERS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <atabachneck@ROGERS.COM>
Subject: Re: Confusing results from a simple merge
Richard,
I don't question your logic, but why after four observations? Yes,
definitely, four is the magic number in this situation (I've tried it
merging Curt's sample in descending order and it addresses Curt's problem,
but still fails upon the fifth observation).
Art
------------
"Richard A. DeVenezia" <radevenz@ix.netcom.com> wrote in message
news:b8dumq$91jsk$1@ID-168040.news.dfncis.de...
> Curt:
>
> The confusing results are due to the repeated by groups and 'special
logic'
> coincendence.
>
> There are 100 by-groups (id1=1..20 id2=1..5)
> In table A, each by-group has 7 observations (id3=1..7)
> In table B, each by-gruop has 4 observations (id4=1..4)
>
> When you merge dy id1 id2, the first 4 observations of each tables by
group
> are lined up.
> During the first 4 observations of each by group id3 ^= 5 is met, and thus
> b2 is set to missing.
> After the 4th observation from table A all values from table B are
> effectively 'locked' until explicitly assigned or the next by group
starts.
> During the 5th observation from table A's by group the id3^=5 is false and
> the if falls through and b2 remains missing (since it is 'locked').
> During the 6th and 7th observations from table A's by group id3^=5 is
again
> true and b2 is explicitly set to missing.
>
> Moral: BEWARE REPEATED OBSERVATIONS IN BY GROUPS.
>
> Here is a data step to show more explicitly what happened
> data ab; * Merge A and B here;
>
> data foo;
> merge a b ;
> by id1 id2;
> if id3 ^= 5 then do;
> b2was=b2;
> b2wassetmissing=1;
> b2=.;
> if id3=5 then put 'ERROR!';
> end;
> run;
>
>
>
> --
> Richard A. DeVenezia, http://www.devenezia.com
>
>
> "Curt Seeliger" <seeliger.curt@epa.gov> wrote in message
> news:200304251649.10310.seeliger.curt@epa.gov...
> > Folks,
> >
> > It's late friday afternoon and most of you have returned home safely.
> >
> > I have two datasets A and B with by-variables id1 and id2 in common, and
> > I'd like to copy the variables b1-b3 from B to A, and have them be
> > non-missing only at specific records (where id3=5, in this example).
> > Code which *should* do this is listed below. I've been able to get it
to
> > run by splitting the merging data step -- those statements are commented
> > out, along with some other operations I'd be doing with the original
> > 'real' data.
> >
> > The problem is that b2 is set to missing regardless of the value of id3.
> > Separating the datastep by uncommenting out the line of statements
causes
> > the code to run as intended. The second odd thing about this is that the
> > ERROR message is never printed, indicating that the IF statement is
> > working correctly. It's as if b2 is being set to missing outside the IF
> > statement.
> >
> > Maybe the biggest problem is that I'm missing something really obvious.
> > Thanks for any clues,
> >
> > data a; * Two fake data sets;
> > do id1=1 to 20;
> > do id2=1 to 5;
> > do id3=1 to 7;
> > a1=id1; a2=id1*id2;
> > output;
> > end;
> > end;
> > end;
> > run;
> >
> > data b;
> > do id1=1 to 20;
> > do id2=1 to 5;
> > do id4=1 to 4;
> > b1=id1*id2; b2=id1*id2+id4; b3=id4;
> > output;
> > end;
> > end;
> > end;
> > run;
> >
> > data ab; * Merge A and B here;
> > merge a/*(in=in_a)/**/
> > b/* (where=(id4=1)) /**/
> > ;
> > by id1 id2;
> > *if in_a;
> > /*run; data ab; set ab; /* Split the data step, or don't */
> > *drop id4;
> > if id3^=5 then do;
> > b2=.; if id3=5 then put 'ERROR!';
> > end;
> > run;
> > proc print data=ab uniform; * wonder why? I do. ;
> > run;
> >
> > The only explaination I'm coming up with is that the DSV isn't been
loaded
> > with the data from B for records with id3>4 (which is the number of id4
> > values in B) until the dataset AB is created. (You might need to run the
> > code to see what I mean, checking for id3^=4 instead). This makes the
> > problem somewhat simpler: Why isn't the DSV completed until after the
> > output dataset is written?
> >
> > cur
> >
> > --
> > Curt Seeliger, Data Ranger
> > CSC, EPA/WED contractor
> > 541/754-4638
> > seeliger.curt@epa.gov
>
>
>
>
|