Date: Wed, 29 Oct 2003 15:14:30 -0500
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: how to delete a subject when some variable has missing values
? example inside
Content-Type: text/plain
Sigurd,
I would think
group by id having nmiss(wt) = 0
would be much clearer in SAS.
I then wondered about the conventionality. Most SQL's do not allow the
remerging that makes this code possible in SAS. I should think a form using
a subquery to get the list of wanted (or not wanted) ID's would be more
standard SQL way.
I was suprised at how much more efficient the remerging process was in SAS
SQL than the subquery form. However, I think this is another indication
that SAS SQL has been optimised for joins rather than subqueries.
In comparing the implied join with an explicit join I was again surprised at
how much better the performance of the implied join over an explicit one.
group by
real time 10.16 seconds
cpu time 6.75 seconds
explicit join
real time 15.61 seconds
cpu time 12.45 seconds
subquery
real time 22.11 seconds
cpu time 21.21 seconds
IanWhitlock@westat.com
-----Original Message-----
From: Sigurd Hermansen
Sent: Wednesday, October 29, 2003 1:21 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: how to delete a subject when some variable has missing values ?
example inside
A more conventional SQL solution looks agrammatic but a bit more intuitive
and robust than the solution using MONOTONIC():
......
select * from test
group by id having not sum(wt is missing)
;
Sig
-----Original Message-----
From: Paul Dorfman [mailto:paul_dorfman@HOTMAIL.COM]
Sent: Sunday, October 26, 2003 1:18 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: how to delete a subject when some variable has missing values?
example inside
>From: David Neal <afdbn@UAA.ALASKA.EDU>
>Paul,
>Your first program misses the fact that the missing could be in the
>middle of the set of weights.
David,
It does not miss the fact, it just contains the mistake of making the
statement
kill = missing (wt) ;
unconditional, which results in KILL being overwritten in a subsequent
record. Mea culpa! This:
if not kill then kill = missing (wt) ;
rectifies the situation.
Also, the code can be made more efficient in the case there are a lot of
groups with a missing weight in need to be weeded out. In this case, instead
of doing the full second sequential pass over the data, we can reread only
the non-missing groups randomly:
data new ( drop = kill q p ) ;
do q = p + 1 by 0 until ( last.id ) ;
set old ;
p ++ 1 ;
by id ;
if not kill then kill = missing (wt) ;
end ;
if not kill then do n = q to p ;
set old point = n ;
output ;
end ;
run ;
Kind regards,
=================
Paul M. Dorfman
Jacksonville, FL
=================
Try this one with your first program.
>
>data old;
>input id time wt @@;
>cards ;
>1 1 50 1 2 50.5 1 3 52
>2 1 49 2 2 . 2 3 .
>3 1 49 3 2 . 3 3 51
>4 1 51 4 2 55 4 3 60
>;run;
>
>It leaves a missing value in there for ID #3 because it was missing in
>the middle.
>
>Your SQL version does a much better job.
>
>
>My solution used proc transpose and isn't nearly as nice as the proc
>sql way. (Might be time to start learning sql.)
>
>
>data temp;
>input id time wt @@;
>cards ;
>1 1 50 1 2 50.5 1 3 52
>2 1 49 2 2 . 2 3 .
>3 1 49 3 2 . 3 3 51
>4 1 51 4 2 55 4 3 60
>;run;
>proc transpose data=temp out=transposed prefix=t;
> by id ;
> id time;
> var wt;
>run;
>data transposed2;
> set transposed;
> if t1=. or t2=. or t3=. then delete;
>run;
>
>proc transpose data=transposed2 out=final ;
> by id ;
> var t1 t2 t3;
>run;
>proc print data=final;run;
>
>data final;/*Just to make things look nice ;)*/
> set final;
> time=substr(_name_,2,1)*1;
> weight=wt;
> drop _name_ wt;
>run;
>proc print data=new;run;
>
>
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>Paul Dorfman
>Sent: Saturday, October 25, 2003 10:58 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: how to delete a subject when some variable has missing
>values? example inside
>
>Raymond,
>
>You can accomplish that by writing a program that reads your input file
>and writes another file the way you want it. For example, it might look
>like
>
>data new ( drop = kill ) ;
> do until ( last.id ) ;
> set old ;
> by id ;
> kill = missing (wt) ;
> end ;
>
> do until ( last.id ) ;
> set old ;
> by id ;
> if not kill then output ;
> end ;
>run ;
>
>True, it reads the input twice, but you do not have the full
>information about the data in a group before you have got to its end...
>Or, you could go the parsimonious SQL route,
>
>proc sql ;
> select * from old
> group by id
> having n (wt) = count ( monotonic() )
> ;
>quit ;
>
>which, of course, still reads the data twice, but does it
>surreptitiously behinds the scenes.
>
>=================
>Paul M. Dorfman
>Jacksonville, FL
>=================
>
>
> >From: Raymond <ray6060@HOTMAIL.COM>
> >Reply-To: Raymond <ray6060@HOTMAIL.COM>
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: how to delete a subject when some variable has missing values?
> > example inside
> >Date: Sat, 25 Oct 2003 21:35:15 -0700
> >
> >For example,each subject was measured at 3 time points, when some
> >subject has missing value on "wt" at any time point, I need to delete
> >this subject from the dataset, how to accomplish it? Thanks a lot.
> >
> >id time wt
> >1 1 50
> >1 2 50.5
> >1 3 52
> >2 1 49
> >2 2 .
> >2 3 .
>
>_________________________________________________________________
>Cheer a special someone with a fun Halloween eCard from American
>Greetings! Go to
>http://www.msn.americangreetings.com/index_msn.pd?source=msne134
_________________________________________________________________
Add MSN 8 Internet Software to your current Internet access and enjoy
patented spam control and more. Get two months FREE!
http://join.msn.com/?page=dept/byoa