|
On 8/22/08, Mary <mlhoward@avalon.net> wrote:
>
> Thanks much. I just tried it on some of my data, and it did work. A couple
> of suggestions (just to make it better than the already GREAT it is now):
>
> 1. Have it be able to work in the case where there the data is either all
> character or all numeric (I tried it with all character variables, and it
> balked because it couldn't find an ID on the numeric table with 0 rows).
It was not my intention to write an general solution, that would work
as you describe. One could call upon the macro language to package
the code, checking for existence of char/num in the list of variables
but as I said that was not my intention. I did think it interesting
to see if I could alter the existing code to function given a scenario
where ether the char or num list was null.
The code below accomplishes that task to some degree, and it is
moderately interesting to me.
> 2. Have some way to sort the variables back in the original order of the
> data set. I ran a proc sql; create table ... select var1, var2, etc., to
> get it back in original order but it seems there ought to be a better way to
> do this, as the best part of using your solution is not having to name the
> variable names!
This is already accomplshed using the non-executed set. There
variables are redefined in the same order as the original input data
because the original input data set is use to define the variables. I
suppose one could think of variable order as another kind of meta
data.
data wide;
infile cards missover;
informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
input id:$3. visit adm dose wt drug_name/*OP mentioned many more*/;
*drop drug_name; *switch these DROPs on and off to test;
drop adm dose wt;
cards;
001 1 52 32 78.2 Prozac
001 2 25 54 78.5
001 3 . . .
002 1 56 54 90.0 Zoloft
002 2 . . .
003 1 52 32 78.5 Wellbutrin
003 2 . 54 78.2 Prozac
003 3 25 . 79.4
;;;;
run;
%let varlist = drug_name--drug_name;
/*%let varlist = adm--wt;*/
/*%let varlist = adm--drug_name;*/
proc transpose data=wide out=nums;
by id visit;
var %sysfunc(tranwrd(&varlist,--,-numeric-));
run;
proc sort; by id _name_ visit; run;
proc transpose data=wide out=char(rename=(col1=char1));
by id visit;
var %sysfunc(tranwrd(&varlist,--,-character-));
run;
proc sort; by id _name_ visit; run;
data locf;
attrib _type_ length=$1;
attrib _name_ length=$32;
if 0 then set char(keep=char1 rename=(char1=locfC));
call missing(locfC);
do until(last._name_);
set nums(in=in1) char(in=in2);
by id _name_;
_type_ = substr('NC',sum(in1,in2*2),1);
if in1 and not missing(col1) then locf = col1;
if in2 and not missing(char1) then locfC = char1;
output;
end;
run;
proc print;
run;
/* back to wide if you like */
proc sort data=locf;
by id visit;
run;
proc transpose data=locf(where=(_type_='N')) out=locfN(drop=_:);
by id visit;
var locf;
id _name_;
run;
data locfNV / view=locfNV;
if _n_ eq 1 and eof then stop;
set locfN;
return;
set locf(where=(_type_='N') keep=_type_) end=eof;
drop _type_;
run;
proc transpose data=locf(where=(_type_='C')) out=locfC(drop=_:);
by id visit;
var locfC;
id _name_;
run;
data locfCV / view=locfCV;
if _n_ eq 1 and eof then stop;
set locfC;
return;
set locf(where=(_type_='C') keep=_type_) end=eof;
drop _type_;
run;
data work.locfWide;
if 0 then set wide; /* recover attributes */
merge locfNV locfCV;
by id visit;
run;
proc contents varnum;
proc print;
run;
>
> But otherwise, it worked great. Thanks much.
>
> -Mary
> ----- Original Message -----
> From: ./ ADD NAME=Data _null_,
> To: SAS-L@LISTSERV.UGA.EDU
> Sent: Friday, August 22, 2008 2:54 PM
> Subject: Re: how to copy a row
>
> On 8/22/08, Mary <mlhoward@avalon.net> wrote:
> >
> > Fascinating, code and does indeed work; it is nice because it is more
> > generic in terms of not having to name specific variables. I have some
> > questions on it:
> >
> > 1. How is the transpose working to pick up only the character or
> variables?
> > Is it that you just put character in the middle of the range of variable?
> >
>
> Yes I suppose you could call character/numeric Name Range qualifiers.
> See http://tinyurl.com/6zyodg for details of SAS Variable Lists. This
> is a very powerful SAS feature.
>
> >
> > proctranspose data=wide out=char(rename=(col1=char1));
> > by id visit;
> > var adm-character-drug_name;
> > run;
> >
> > 2. What is this statement for? I see that you have got a set nums char
> > with an in clause to get which type you are doing, so why would you need
> > this?
> >
> > if 0 then set char(keep=char1 rename=(char1=locfC));
>
> I need to define locfC character variable. I could have used ATTRIB
> or LENGTH but I knew that I wanted the same attributes as CHAR1 so I
> used the non-executed set statement to "copy" the variable attributes
> from data work.char. Note also the explicit "initialize to missing"
> this is used because I need LOCFC to be initialized to missing before
> each new ID. Variables defined with SET/MERGE/UPDATE are not
> "initialized to missing".
>
> >
> > 3. How does this statement work to retrieve the formats?
> >
> > if0 then set wide; /* recover attributes */
>
> Recall that variable attributes are defined when the data step is
> compiled. Here I refer to all variables in WIDE which I also know are
> the same variables that have come from the flip-flop using the
> transpose out and back.
>
> This action defines all meta data attributes TYPE LENGTH IN/FORMAT and
> LABEL.
>
>
> >
> > -Mary
> >
> >
> > ----- Original Message -----
> > From: ./ ADD NAME=Data _null_,
> > To: SAS-L@LISTSERV.UGA.EDU
> > Sent: Friday, August 22, 2008 12:12 PM
> > Subject: Re: how to copy a row
> >
> > I believe I mentioned that I did not address Character variables.
> > This is easy enough however, see example below, and retrieving the
> > attributes is easy with a non-executed SET. For a long time I to
> > worried about "losing attributes" until I realized they can be easily
> > re-attached.
> >
> > I could not figure out what you were referring to with regards to the
> > many IF statements you included. Also you don't want to use LAG
> > because the last non missing value of interest may not be returned by
> > LAG. You want to "retain" non-missing until it is replace by new
> > non-missing value. In general this is the kind of coding I am trying
> > to avoid by transposing the data, I like my code to refer directly (by
> > name) to as few variables as possible.
> >
> > data wide;
> > infile cards missover;
> > informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > input id:$3. visit adm dose wt drug_name/*OP metioned many more*/;
> > cards;
> > 001 1 52 32 78.2 Prozac
> > 001 2 25 54 78.5
> > 001 3 . . .
> > 002 1 56 54 90.0 Zoloft
> > 002 2 . . .
> > 003 1 52 32 78.5 Wellbutrin
> > 003 2 . 54 78.2 Prozac
> > 003 3 25 . 79.4
> > ;;;;
> > run;
> > proc transpose data=wide out=nums;
> > by id visit;
> > var adm-numeric-drug_name;
> > run;
> > proc sort; by id _name_ visit; run;
> > proc transpose data=wide out=char(rename=(col1=char1));
> > by id visit;
> > var adm-character-drug_name;
> > run;
> > proc sort; by id _name_ visit; run;
> > data locf;
> > attrib type length=$1;
> > attrib _name_ length=$32;
> > if 0 then set char(keep=char1 rename=(char1=locfC));
> > call missing(locfC);
> > do until(last._name_);
> > set nums(in=in1) char(in=in2);
> > by id _name_;
> > type = substr('NC',sum(in1,in2*2),1);
> > if in1 then locf = ifn(missing(col1) ,locf ,col1);
> > if in2 then locfC =
> ifc(missing(char1),locfC,char1);
> > output;
> > end;
> > run;
> > proc print;
> > run;
> >
> > /* back to wide if you like */
> > proc sort data=locf;
> > by id visit;
> > run;
> > proc transpose data=locf(where=(type='N')) out=locfN(drop=_:);
> > by id visit;
> > var locf;
> > id _name_;
> > run;
> > proc transpose data=locf(where=(type='C')) out=locfC(drop=_:);
> > by id visit;
> > var locfC;
> > id _name_;
> > run;
> > data work.locfWide;
> > if 0 then set wide; /* recover attributes */
> > merge locfN locfC;
> > by id visit;
> > run;
> > proc contents varnum;
> > proc print;
> > run;
> >
> >
> > On 8/22/08, Mary <mlhoward@avalon.net> wrote:
> > >
> > > The problem is that your approach loses all formats and informats on the
> > > original variables, and with mixed character/numeric data, it drops all
> > the
> > > character variables. When I try an original data set like this:
> > >
> > >
> > >
> > > datawide;
> > >
> > > informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > >
> > > format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > >
> > > input id:$3. visit adm dose wt drug_name/*OP metioned many more*/;
> > >
> > > cards;
> > >
> > > 001 1 52 32 78.2 Prozac
> > >
> > > 001 2 25 54 78.5
> > >
> > > 001 3 . . .
> > >
> > > 002 1 56 54 90.0 Zoloft
> > >
> > > 002 2 . . .
> > >
> > > 003 1 52 32 78.5 Wellbutrin
> > >
> > > 003 2 . 54 78.2 Prozac
> > >
> > > 003 3 25 . 79.4
> > >
> > > ;
> > >
> > > run;
> > >
> > >
> > >
> > > It drops my drug name, and the resulting data set has no informats or
> > > formats. Also, you've got no control on individual variables, as you
> > might
> > > want to update some and not update others, or ask more complicate IF
> > > statements, like the length of the time of the lag. Here's a longer
> > > approach, but in my mind clearer, and also it allows for mixed variable
> > > types such as integer, decimal, character, and date, and retains all
> > > informats and formats of the original variables:
> > >
> > >
> > >
> > > datawide;
> > >
> > > infile cards missover;
> > >
> > > informat adm 4.0 dose 4.0 wt 4.1 drug_name $40. flu_vaccine_date
> > mmddyy10.;
> > >
> > > format adm 4.0 dose 4.0 wt 4.1 drug_name $40. flu_vaccine_date
> mmddyy10.;
> > >
> > > input id $3. visit adm dose wt drug_name flu_vaccine_date;
> > >
> > > cards;
> > >
> > > 001 1 52 32 78.2 Prozac 11/1/2006
> > >
> > > 001 2 25 54 78.5
> > >
> > > 001 3 . . .
> > >
> > > 002 1 56 54 90.0 Zoloft 10/7/2007
> > >
> > > 002 2 . . .
> > >
> > > 003 1 52 32 78.5 Wellbutrin 11/8/2006
> > >
> > > 003 2 . 54 78.2 Prozac 11/5/2007
> > >
> > > 003 3 25 . 79.4
> > >
> > > ;
> > >
> > > run;
> > >
> > > datawide2;
> > >
> > > set wide;
> > >
> > > by id;
> > >
> > > retain prev_adm prev_dose prev_wt prev_drug_name prev_flu_vaccine_date;
> > >
> > > lag_adm=lag(adm);
> > >
> > > lag_dose=lag(dose);
> > >
> > > lag_wt=lag(wt);
> > >
> > > lag_drug_name=lag(drug_name);
> > >
> > > lag_flu_vaccine_date=lag(flu_vaccine_date);
> > >
> > > if lag_adm ^= . then prev_adm=lag_adm;
> > >
> > > if lag_dose ^=. then prev_dose=lag_dose;
> > >
> > > if lag_wt ^=. then prev_wt=lag_wt;
> > >
> > > if lag_drug_name ^= ' ' then prev_drug_name=lag_drug_name;
> > >
> > > if lag_flu_vaccine_date ^= . then
> > > prev_flu_vaccine_date=lag_flu_vaccine_date;
> > >
> > > if adm=. and not first.id then adm=prev_adm;
> > >
> > > if dose=. and not first.id then dose=prev_dose;
> > >
> > > if wt=. and not first.id then wt=prev_wt;
> > >
> > > if drug_name=' ' and not first.id then drug_name=prev_drug_name;
> > >
> > > if flu_vaccine_date=. and not first.id then
> > > flu_vaccine_date=prev_flu_vaccine_date;
> > >
> > > keep id visit adm dose wt drug_name flu_vaccine_date;
> > >
> > > run;
> > >
> > >
> > >
> > > -Mary
> > > ----- Original Message -----
> > > From: ./ ADD NAME=Data _null_,
> > > To: SAS-L@LISTSERV.UGA.EDU
> > > Sent: Friday, August 22, 2008 9:50 AM
> > > Subject: Re: how to copy a row
> > >
> > > This does not address LOCF for character variables. I can't recall
> > > doing LOFC for a character variable but there is no reason not too.
> > > Here I assume all _NUMERIC_.
> > >
> > > I believe this code is pretty close to what I would consider the
> > > current standard LOCF that I have seen posted here on SAS-L or written
> > > about is SUG.
> > >
> > > Makeing DATA LOCF below a view and then using that as input to PROC
> > > SORT would be a bit more efficient espically with more data.
> > >
> > > data wide;
> > > input id:$3. visit adm dose wt /*OP metioned many more*/;
> > > cards;
> > > 001 1 52 32 78
> > > 001 2 25 54 78
> > > 001 3 . . .
> > > 002 1 56 54 90
> > > 002 2 . . .
> > > 003 1 52 32 78
> > > 003 2 . 54 78
> > > 003 3 25 . 79
> > > ;;;;
> > > run;
> > > /* normalize */
> > > proc transpose out=tall;
> > > by id visit;
> > > var adm--wt;
> > > run;
> > > /* LOCF */
> > > proc sort;
> > > by id _name_ visit;
> > > run;
> > > data locf;
> > > do until(last._name_);
> > > set tall;
> > > by id _name_;
> > > locf = ifn(missing(col1),locf,col1);
> > > output;
> > > end;
> > > run;
> > > proc print;
> > > run;
> > > /* back to wide if you like */
> > > proc sort;
> > > by id visit;
> > > proc transpose out=widelocf;
> > > by id visit;
> > > var locf;
> > > id _name_;
> > > run;
> > > proc print;
> > > run;
> > >
> > >
> > >
> > > On 8/22/08, Mary <mlhoward@avalon.net> wrote:
> > > >
> > > > And what would your view of "normalizing the data" look like? This
> > data
> > > > does look normalized to me- you wouldn't want data to be variables
> like
> > > > patid, visit, variable name, value, since variables have values of
> > > > substantially different types and ranges of values, like a visit would
> > be
> > > an
> > > > integer, a strength would be a MG value like 20MG, a number of tabs
> > would
> > > be
> > > > an integer, like 2 tabs per day, a weight would be a value perhaps
> with
> > a
> > > > decimal, and a height might have a decimal as well, whereas the name
> of
> > > the
> > > > drug might be something like "Prozac" and be character. Thus, having
> > > > variables across **does** seem to be appropriate normalization for
> data
> > > like
> > > > this, and does not seem to be difficult to deal with in "carrying
> > forward"
> > > > previous values.
> > > >
> > > > Yes, each variable should be carried forward individually; certainly
> > > missing
> > > > character variables and missing numeric variables could not be
> combined
> > in
> > > > the same array, like this:
> > > >
> > > >
> > > >
> > > > do i=1 to dim(vararray);
> > > >
> > > > a=lag(vararray(i));
> > > >
> > > > if vararray(i)=' ' and not first.visit then vararray(i)=a;
> > > >
> > > > end;
> > > >
> > > > drop i a;
> > > >
> > > > run;
> > > >
> > > >
> > > >
> > > > But this code could be done on groups of variables with the same type
> > that
> > > > are variables that make sense to carry forward values, by specifying
> the
> > > > variables desired in the array and either missing or space in the IF
> > > > condition depending on the variable type.
> > > >
> > > >
> > > >
> > > > -Mary
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: ./ ADD NAME=Data _null_,
> > > > To: SAS-L@LISTSERV.UGA.EDU
> > > > Sent: Friday, August 22, 2008 7:00 AM
> > > > Subject: Re: how to copy a row
> > > >
> > > > It should also be pointed out that the example of carrying for an
> > > > entire observation as the OP asked for and as provided by my example,
> > > > (which suites the sample but not the real world, it wont work with two
> > > > or more missing visits) is not the "accepted" LOCF technique.
> > > >
> > > > The missing pattern is rarely "complete" across and entire
> > > > observation. A more general solution would involve carrying forward
> > > > each variable individually. This is easily accomplished by
> > > > normalizing the data. Makes the code much simpler, too.
> > > >
> > > > There are many examples on SAS-L and in SUG papers.
> > > >
> > > >
> > > > On 8/21/08, Lou <lpogoda@verizon.net> wrote:
> > > > > "Kevin Viel" <citam.sasl@GMAIL.COM> wrote in message
> > > > >
> > > >
> > >
> >
> news:200808212013.m7LGKBff032286@malibu.cc.uga.edu...
> > > > > > On Mon, 18 Aug 2008 08:50:11 -0700, pausha <pausha1@GMAIL.COM>
> > wrote:
> > > > > >
> > > > > > >i have this strangle condition and need some help
> > > > > > >I have a dataset with patients who visit every week ,say if they
> > miss
> > > > > > >a week i still have to have their data copied form previous week
> > for
> > > > > > >that missed one.
> > > > > > >eg
> > > > > > >
> > > > > > >pat-id visit adm dose wt
> > > > > > >
> > > > > > >001 1 52 32 78
> > > > > > >001 2 25 54 78
> > > > > > >001 3 . . .
> > > > > > >002 1 56 54 90
> > > > > > >002 2 . . .
> > > > > > >
> > > > > > >in this case i would have to have the previous row copied to the
> > > > > > >visits that is missing
> > > > > > >help with logic to populate the whole row
> > > > > > >
> > > > > > >i have 40 variables in a row.
> > > > > >
> > > > > > I have not read through any but one of the responses, so I hope
> that
> > I
> > > > am
> > > > > > not duplicating my warning. No, check that, I hope I am repeating
> > it.
> > > > > >
> > > > > > I would strongly advise against this. You could bias your
> results.
> > > > > > Perhaps wt (weight) does not change from visit to visit (or more),
> > but
> > > > > > other things may.
> > > > > >
> > > > >
> > > > > In a clinical trials setting, it's pretty common to do analyses on a
> > > "last
> > > > > observation carried forward" basis to fill in missing data in
> addition
> > > to
> > > > > having analyses based solely on the data collected. Any tables
> > > generated
> > > > on
> > > > > an LOCF basis are so labeled.
> > > > >
|