Date: Wed, 10 Mar 2010 22:04:55 -0800
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: Using character variales as continuous variables
In-Reply-To: <3b46bd35-5e94-448d-8e3d-32a8dca00c90@z11g2000yqz.googlegroups.com>
Content-Type: text/plain; charset=windows-1252
Lance,
There is probably a more direct way to do this and my proposed code
makes two assumptions that would have to be met in order for the code
to work. It assumes:
1. that you always get all three of the responses for each variable
2. that, in ascii order, the lowest value is always assigned 0, the
middle value is assigned 1, and the highest value is assigned a 2
If those assumptions are met, then you could try:
/* The following data step is only intended to provide a sample data
set */
data have;
input (snp1-snp3) ($);
cards;
AA TT AA
AG CC CC
GG CT AC
;
/* Run a proc freq but route the output to a file */
ods listing close;
proc freq data=have;
ods output onewayfreqs=onewayfreqs;
run;
Ods listing;
/* Parse the output and use it to create an include file */
filename sascode temp;
data _null_;
file sascode;
length variable $32 value $64 command $80;
set oneWayFreqs;
variable = scan(table,-1);
if variable ne lag(variable) then score=0;
else score+1;
value = coalesceC(of F_:);
command=catx(" ","if",variable,"eq '");
command=catt(command,value,"' then new_",variable,"=",score,";");
put command;
run;
/* Do the recode in a data step by including the file of recode
statements */
data want;
set have;
%include sascode;
run;
/* release the temporary file */
filename sascode clear;
HTH,
Art
------------
On Mar 10, 6:53 pm, Lance Smith <medicaltr...@gmail.com> wrote:
> Dear all,
>
> I have a database of 50 SNP variables. Each SNP variable has 3 levels
> let’s say AA, AG, GG. The levels vary with different SNPs, so another
> one may be CC CT and TT and still another may be AA AC and CC.
>
> I also have levels of four markers that are on a continuous scale.
> I need to do univariate linear regression to predict the level of
> biomarkers using wach SNP seperately.
> Thus I need to do 50*4 = 200 univariate linear regressions.
> The SNPs need to be recoded to 0,1,2 for the regression as we want to
> treat them as a continuous variable with the heterozygotes (AG or CT
> or AC) coded as 1.
>
> Is there a way to efficiently do the recoding to 0,1,2 in SAS without
> having to recode all the 50 SNPs separately? Or is there a way to tell
> SAS to treat them as continuous variables even though they are coded
> as character variables?
>
> Thank you
|