Date: Thu, 6 Aug 2009 23:00:51 +0000
Reply-To: snoopy369@GMAIL.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Using duplicate values
In-Reply-To: <39C5375D-2B49-40FE-84C4-23F1EE47E8C5@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
I see you have a few solutions, here's the HASH solution. Quite fast. The
only danger is if it's possible to have multiple values of sex for one id,
which it sounds like it oughn't be, but be careful nonetheless.
Note that I make a view of test with only nonmissing values of sex; that
way, if missing comes first for an ID, it still is assigned a value (note
how I switched the two 11's). The RETAIN or LAG solutions will not catch
that, if it's possible, and the hash solution will not deal with it
properly either if it's not cut out of the dataset (though perhaps Paul or
someone else will show us/me how to deal with it without creating a view if
it's possible?)
data test;
input id value sex $;
datalines;
10 104 f
11 98 .
11 100 m
12 97 f
13 89 f
12 99 .
;;;;
run;
data testv/view=testv;
set test;
where not missing(sex);
run;
data want;
if _n_ = 1 then do;
format id 2.;
format sex $1.;
declare hash h(dataset:"work.testv");
h.defineKey('id');
h.defineData('sex');
h.defineDone();
call missing(id, sex);
end;
set test;
if missing(sex) then do;
rc = h.find();
end;
run;
-Joe
On Aug 6, 2009 4:34pm, Joe <snoopy369@gmail.com> wrote:
> If you sort it by ID, you can use various methods (update, for one). If
> retaining the order is important, or speed is a concern, HASH is probably
> the best way, or perhaps a format. If no solution is posted by the time I
> get back to my desk I'll post a specific example (on my phone now).
> Thanks,
> -Joe
> On Aug 6, 2009, at 4:02 PM, lakegoddess@HOTMAIL.COM wrote:
> Hi,
> I'm sorry if this sounds like a stupid question but how do you use
> duplicate values of a specific ID and use that value for the same ID
> but at a later observation? Say the dataset is something like this:
> data test;
> input id value sex;
> datalines;
> 10 104 f
> 11 98 m
> 11 100 .
> 12 97 f
> 13 89 f
> 12 99 .
> Now I need it so that the missing sex value for ID=11 will be "m"
> and the missing one for ID=12 will be "f". How do we go about doing
> that?