Date: Wed, 9 Apr 2008 12:55:02 0700
ReplyTo: Peter <crawfordsoftware@GMAIL.COM>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Peter <crawfordsoftware@GMAIL.COM>
Organization: http://groups.google.com
Subject: Re: data set update
ContentType: text/plain; charset=ISO88591
On Apr 9, 6:38 pm, datan...@GMAIL.COM ("data _null_,") wrote:
> You are not suppose to do that you know. But here's how.
>
> data class;
> set sashelp.class;
> if _n_ eq 1 then age = .;
> if _n_ eq 17 then age = .;
> if _n_ eq 4 then weight = .;
> if _n_ eq 6 then height = .;
> run;
> proc print;
> run;
> proc stdize reponly missing=mean;
> run;
> proc print;
> run;
>
>
>
> On Wed, Apr 9, 2008 at 12:28 PM, Alex Tang <Alex.T...@creditone.com> wrote:
> > Hi there,
>
> > Here I get a question of data manipulation.
>
> > I have two data sets, say, couple of hundred numerical variables. Both
> > of them have same variables.
>
> > First one is the data set I am working on. It's large, say million
> > observation.
>
> > Second on is simply the mean of each variable of the first one.
>
> > Because first one has missing value, I want to substitute any missing
> > value with the average value of that variable corresponding in the
> > second data set.
>
> > Is there any easy way to do this instead of typing in each variable and
> > average value in assignment statement in if clause?
>
> > I am thinking of PROC SQL UPDATE statement but a little messed up.
>
> > Thanks,
>
> > Alex
>
> > ******************* Email nondisclosure ******************
>
> > The information contained in this email message may be proprietary and/or confidential, and
> > protected from disclosure. If the reader of this message is not the intended recipient,
> > or an employee or agent responsible for delivering this message to the intended recipient,
> > you are hereby notified that any dissemination, distribution or copying of this communication
> > is strictly prohibited. If you have received this communication in error, please notify
> > Credit One Bank immediately by replying to this message and delete the original message. Thank you. Hide quoted text 
>
>  Show quoted text 
that proc stdize solution looks really great, but needs SAS/STAT among
the modules you license.
Another way to achieve the objective might be to try a data step like.
data original_data ;
if _n_ =1 then do;
set means_data(obs=1) ;
array meansd _numeric_ ;
array meanst( 1000 ) _temporary_; * larger number if >1000
varables;
* now transfer the means into the temp array ;
do over meansd;
meanst(_i_) = meansd ;
end;
end;
modify original_data ;
if nmiss( of meansd(*) ) then do;
do over meansd ;
meansd = coalesce( meansd, meanst(_i_) );
end;
replace ;
end;
run;
When the step suns, only rows with at least one missing value will be
replaced. The NOTEs in the log at the end of the step will indicate
how many rows needed to be replaced.
Kindly, data_NULL_ has provided a testdata builder. That, and
building a means_data set follow
data original_data ;
set sashelp.class;
if _n_ eq 1 then age = .;
if _n_ eq 17 then age = .;
if _n_ eq 4 then weight = .;
if _n_ eq 6 then height = .;
run;
proc means noprint;
output mean= out= means_data(drop= _: ) ;
run;
I like modify in this situation because you have only one copy of
large data and it runs quickly. However, beware that you may have a
problem if the modify step fails during execution, as part of your
data is updated and part is not.
I don't think it i a major porblem here, depending on your SAS
platform's stability and security (you have a backup of teh data
don't you? ;)
Good Luck
PeterC
