LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 11 Mar 2010 07:55:28 -0800
Reply-To:   mlhoward@avalon.net
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Mary <mlhoward@AVALON.NET>
Subject:   Re: Using character variales as continuous variables
Comments:   To: Lance Smith <medicaltrial@GMAIL.COM>
Content-Type:   text/plain; charset="UTF-8"

Hi, Lance,

I'm not really sure that there is a quick way of doing this- in my last job I had 3000 SNP's. You might be able to generate code to get the IF statements, but with only 50 that won't really be worth the effort over this:

if rs300005='AA' then rs300005_num=0; else if rs300005='AB' then rs300005_num=1; else if rs300005='BB then rs300005_num=2; else rs300005_num=.;

A bigger problem is thinking about whether it is appropriate to run a SNP as a continous rather than a categorical variable. I had a great deal of difficulty interpreting the output when I attempted to do this; is being het for a SNP half way in between being Homo on Allele 1 and Homo on Allele2? What does it mean if this is significant? I think what I arrived at is that SNPs are not really continous; there are three categories, much like a color would have as red, blue, green, and not like a likert scale of disagree, neutral, agree. Thus you'd need to question the wisdom of running them as ordinal variables rather than categorical variables.

Although I did do this initially, in going to press we decided NOT to use it; it was just too difficult to defend to publish. You might consider using disease as your dependent variable with a categorical SNP as your independent variable (or one of your variables) in a logistic regression predicting disease or no disease, as the independent variable could have three categories.

-Mary

medicaltrial@GMAIL.COM wrote:

From: Lance Smith <medicaltrial@GMAIL.COM> To: SAS-L@LISTSERV.UGA.EDU Subject: Using character variales as continuous variables Date: Wed, 10 Mar 2010 15:53:02 -0800

Dear all,

I have a database of 50 SNP variables. Each SNP variable has 3 levels let’s say AA, AG, GG. The levels vary with different SNPs, so another one may be CC CT and TT and still another may be AA AC and CC.

I also have levels of four markers that are on a continuous scale. I need to do univariate linear regression to predict the level of biomarkers using wach SNP seperately. Thus I need to do 50*4 = 200 univariate linear regressions. The SNPs need to be recoded to 0,1,2 for the regression as we want to treat them as a continuous variable with the heterozygotes (AG or CT or AC) coded as 1.

Is there a way to efficiently do the recoding to 0,1,2 in SAS without having to recode all the 50 SNPs separately? Or is there a way to tell SAS to treat them as continuous variables even though they are coded as character variables?

Thank you


Back to: Top of message | Previous page | Main SAS-L page