LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 21 Dec 2007 11:16:48 -0500
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: Performing thousands of tests automatically
Comments:   To: Mary <mlhoward@avalon.net>, Ron Do <rondo@HOTMAIL.COM>
In-Reply-To:   <004501c841b6$9466d680$c12fa8c0@HP82083701405>
Content-Type:   text/plain; charset="us-ascii"

Even an analyst with my limited understanding of genetics has to wonder about the value of what one would find after multiples of multiple comparisons. It seems to me to border on bad science. I do more than a little data mining, but I have to wonder about anecdotal evidence of successes of blind searches for statistical significance.

Ordinary least squares regressions fit a highly restricted model and, when dealing with single predictors such as an SNP, may not fit data well enough by chance to reach a usual standard of statistical significance. More flexible methods of fitting models often overfit noise. For example, a simple model estimated in PROC MIXED using a hundred series of values of y and 1000 sets of 100 series of values of x (groups j=1 ... 1000),

proc printto Print=_null_ log="H:\MY DOCUMENTS\SASPrograms\MixedStat.log"; run; ods listing body="H:\MY DOCUMENTS\SASPrograms\MixedStat.txt"; proc mixed data=test noclprint noinfo noitprint noprofile; model y=x; random intercept; by j; ods output Tests3=SigTest; run; ods listing close; proc sql; create table SigTest as select * from SigTest where probF<0.05 ; quit;

generated 54 instances of estimates with F-tests significant at the 0.05 level. With 100,000 groups that would likely produce 5,400 statistically significant predictions of y given x.

The source of data used in PROC MIXED ....?

data depvar (keep=i y); do i=1 to 100; y=round(ranuni(11131),1.); output; end; run;

data test; /* Declare hash object and read data set as ordered */ if _N_ = 1 then do; length y 3.;

declare hash h(hashexp: 4, dataset: 'depvar', ordered: 'yes'); declare hiter iter('h'); /* Define key and data variables */ h.defineKey('i'); h.defineData('y'); h.defineDone(); /* avoid uninitialized variable notes */ call missing(i,y); end; /* Iterate through the hash object and output data values */ do j=1 to 1e3; rc = iter.first(); do while (rc = 0); x=round(ranuni(23171),1.); z=round(ranuni(12317),1.); output; rc = iter.next(); end; end; run;

.... selected at random.

I've provided the program so you can verify correctness and test it, change it, etc. It seems to me that a test of any model on random observations should set a minimum standard for model uncertainty.

PS David, while we don't pretend to speak for you or claim that you will agree with everything that we say, we are making an effort to continue a cause that you represented so well. Hope that you are doing well now and enjoying the holidays.

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Mary Sent: Tuesday, December 18, 2007 3:43 PM To: Ron Do; SAS-L@LISTSERV.UGA.EDU Subject: Re: Re: Performing thousands of tests automatically

Ron,

Yes, Bonferroni was used in the article that I cited that identified SNP's in the CFH gene as being highly related to Macular Degeneration. While that article may have been a fishing trip, it was later verified by scientific theory and replication to be shown to be correct. My manager (Dr. Greg Hageman, PHD) who discovered the CFH gene link to the disease macular degeneration just formed a company (Optherion; though I work for the University, not the company). The startup investment came in a few months ago at **** 35 million dollars ****; not too bad for a fishing trip :-).

But doing a run-through of all SNP's can only be thought of as a first pass on the data; theory and verification must follow, and the results that come out of such runs just give hints as to where the true associations might actually be.

-Mary ----- Original Message ----- From: Ron Do To: SAS-L@LISTSERV.UGA.EDU Sent: Tuesday, December 18, 2007 2:28 PM Subject: Re: Performing thousands of tests automatically

Bonferroni is used a lot in these instances to account for multiple testing. An important thing for these studies is replication of the association results in another independent sample.


Back to: Top of message | Previous page | Main SAS-L page