Date: Thu, 24 Aug 2006 16:20:13 -0400
Reply-To: "Dorfman, Paul" <paul_dorfman@MERCK.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <paul_dorfman@MERCK.COM>
Subject: Re: Fill in missing score
Content-Type: text/plain
update work.score(obs=0) work.score? Very sleek.
Now, if I may... on a purely aesthetic side... mestillthinks that
do freq = 1 by 1 until(last.id);
update work.score(obs=0) work.score;
by id;
end;
is preferable to
freq = 0;
do until(last.id);
update work.score(obs=0) work.score;
by id;
freq = freq + 1;
end;
On a yet different note, I have noticed over time that you always use
two-level names for data files in the WORK library. I wish to question the
wisdom of doing so here. My arguments against it are two-fold. Using
one-level names enables one to switch between libraries to which the files
are written at any point in the program by using either the option user= or
defining the USER libref. To wit, to switch to a library with a [already
defined] libref DIFF one would merely code
option user = diff ;
or to define a libref USER before the program's onset (e.g. in the config or
autoexec) and have all intermediate data sets stored in a permanent library
instead of WORK without any changes to the program whatsoever. Then they
would be viewable after the program has finished or, especially, aborted. In
the latter case, the viewability of the temp files, including the one which
may have been only partially written, is a great aid in debugging. Once the
files in the library are not needed, they can be deleted from the library,
or the whole thing could be killed altogether. (In fact, in my own practice
I have made it a rule to never write any program-generated files to WORK but
only to a permanent library handled in the above-described manner. It leaves
WORK all to SAS to use, and I do not have to share its space with anyone
else on systems where it is shared. And the ability to examine intermediate
files after a job as abended has saved me untold hours of pulling what's
left of my hair.). Secondly, one-level names are, well, shorter to code.
What are the pros of using two-level names, in your opinion?
Kind regards
------------
Paul Dorfman
Jax, FL
------------
+-----Original Message-----
+From: data _null_; [mailto:datanull@gmail.com]
+Sent: Thursday, August 24, 2006 1:29 PM
+To: Dorfman, Paul
+Cc: SAS-L@listserv.uga.edu
+Subject: Re: Fill in missing score
+
+
+Your comments about the score not being on the first obs got me to
+thinking. I also thought there might be other "scores". I came up
+with this.
+
+data work.score;
+ infile cards missover;
+ input ID Score score2;
+ cards;
+12225 0.365516711
+12225 . 0.365516711
+13073 . 0.365516711
+13073 0.32885697
+13073
+15494 0.466036457
+15494 . 0.466036457
+33501 0.159729592 0.466036457
+33501
+;;;;
+ run;
+proc print;
+ run;
+data work.score0;
+ freq = 0;
+ do until(last.id);
+ update work.score(obs=0) work.score;
+ by id;
+ freq = freq + 1;
+ end;
+ do i = 1 to freq;
+ output;
+ end;
+ drop freq i;
+ run;
+proc print;
+ run;
+
+
+
+
+On 8/24/06, Dorfman, Paul <paul_dorfman@merck.com> wrote:
+> Thien,
+>
+> The fine solutions by Ken and Toby have the advantage of
+reading the file
+> once. A more generic solution would read it twice but would also be
+> impervious to the situation where a non-missing score would
+happen to be
+> located not necessarily in the fist record of each ID by-group:
+>
+> data a ;
+> input id score ;
+> cards ;
+> 1 11
+> 1 .
+> 2 .
+> 2 22
+> 2 .
+> 3 .
+> 3 33
+> ;
+> run ;
+>
+> data b ;
+> merge a (drop = score) a (where = (score is not null)) ;
+> by id ;
+> run ;
+>
+> Alternatively (for a Nothin'-But-SQL ),
+>
+> proc sql ;
+> create table c as
+> select x.id, y.score
+> from a x, a y
+> where x.id = y.id and y.score is not null
+> ;
+> quit ;
+>
+> Of course, if a non-missing score is in the middle of a
+by-group, you can
+> still use the DoW-loop, only the file will still have to be
+read twice:
+>
+> data d ;
+> do _n_ = 1 by 1 until (last.id) ;
+> set a ;
+> by id ;
+> if not missing (score) then _iorc_ = score ;
+> end ;
+> score = _iorc_ ;
+> do _n_ = 1 to _n_ ;
+> set a (drop = score) ;
+> output ;
+> end ;
+> run ;
+>
+> Kind regards
+> ------------
+> Paul Dorfman
+> Jax, FL
+> ------------
+>
+>
+> +-----Original Message-----
+> +From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
+> +Behalf Of Thien Thai
+> +Sent: Thursday, August 24, 2006 8:46 AM
+> +To: SAS-L@LISTSERV.UGA.EDU
+> +Subject: Fill in missing score
+> +
+> +
+> +Hello, I'm a new SAS user and stumbled across this problem
+> +where I need to
+> +fill in the missing score for id that are the same. The data
+> +set looks like this
+> +
+> +ID Score
+> +12225 0.365516711
+> +12225
+> +13073 0.32885697
+> +13073
+> +13073
+> +15494 0.466036457
+> +15494
+> +33501 0.159729592
+> +33501
+> +
+> +and basically I would like to have the same score assign to ID
+> +that are the
+> +same, any help would be much appreciated.
+> +
+> +Ta
+> +
+> +Thien
+> +
+> +
+>
+>
+>
+---------------------------------------------------------------
+---------------
+> Notice: This e-mail message, together with any attachments, contains
+> information of Merck & Co., Inc. (One Merck Drive,
+Whitehouse Station,
+> New Jersey, USA 08889), and/or its affiliates (which may be known
+> outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
+> and in Japan, as Banyu - direct contact information for affiliates is
+> available at http://www.merck.com/contact/contacts.html) that may be
+> confidential, proprietary copyrighted and/or legally
+privileged. It is
+> intended solely for the use of the individual or entity named on this
+> message. If you are not the intended recipient, and have
+received this
+> message in error, please notify us immediately by reply
+e-mail and then
+> delete it from your system.
+>
+>
+---------------------------------------------------------------
+---------------
+>
+
+
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
and in Japan, as Banyu - direct contact information for affiliates is
available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and then
delete it from your system.
------------------------------------------------------------------------------
|