LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2000)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 1 Mar 2000 14:15:07 -0500
Reply-To:   "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Subject:   Re: multiple comparisons - take II
Comments:   To: Jack Wierzchowski <geomar@SUNSHINECABLE.COM>
Comments:   cc: Jui-Ying Feng <jfeng@acsu.buffalo.edu>
In-Reply-To:   <NDBBJJHHMLAPGLKICCPFMEECCBAA.geomar@sunshinecable.com>
Content-Type:   TEXT/PLAIN; charset=US-ASCII

On Sat, 26 Feb 2000, Jack Wierzchowski wrote:

> Hello, > Some time ago I submitted a question regarding adjusting the significance > level to account for multiple comparisons. Dr. Hector Maletta was kind to > respond to the question (thank you again) and suggested re-submiitting it to > solicit more answers to it. Here it is again (any help will be most > appreciated): > > > "I have a generic question regarding what constitutes a multiple comparison > (that is, when does one need to adjust the significance level to account for > multiple comparisons: > I have a set of bear radiolocations split into by age into three groups - > juvenile, mature, old. Each radiolocation (case) has a number of attributes > attached to it (distance to roads, elevation, habitat quality). These > attributes are our set of independent ratio variables. For the 3-class > grouping (age) I clearly see a need to adjust significance level for > multiple comparisons being made on a given variable. What is not clear to me > is whether the fact that I ran three separate ANOVAs (one on each variable) > may influence the significance level for detecting the differences among the > means . To simplify the issue and make it more generic - for a binary model > (say, males versus females) - will testing for the differences between the > means on three INDEPENDENT variables constitute a case of multiple > comparison in which adjustments to significance levels are required? " > > I believe that the answer should be a "no" because an increase in the > probability of finding a statistically significant difference occurs when > THE SAME means are tested several times (again, if one has, say, three age > groups and runs a test on the distance to roads, the fact that the same > means get tested several times introduces an increased risk of detecting > significant difference when in fact such difference is not present) in > this case the tests are performed on a number of INDEPENDENT variables > (unrelated means). However, some editors of the wildlife management > journals insist that such adjustments to alphas are necessary, because THE > SAME DATASET (radiotelemetry locations) is tested on many variables, which, > in their mind constitute a multiple testing situation. Who is right? > Jack" >

Dear Jack--

The answer is that there is no clearcut answer. Who is "right" depends on which kind(s) of "instability" wildlife journals need to protect against in publishing research reports and/or interpreting the findings of such reports. This is the sort of issue about which different editors and/or journals might legitimately reach different decisions. (Caution: Unless you are truly interested in this, stop here.)

My own point of view, for what that is worth, is that it would be more dysfunctional than functional for wildlife journals to require that a multivariate test of statistical significance be performed AND FOUND TO BE SIGNIFICANT before proceeding to ANOVAS for each "attribute" in research of the kind that you describe. This conclusion is based on the fact that such a policy would tend to select articles which examine very few variables (perhaps through a priori selection of those variables for which a relationship is likely to exist and perhaps by ex post facto selection) and to reject articles which include results on all the variables for which data are readily available. This might even encourage researchers to do separate studies and/or present separate articles on each of the variables in question, in order to increase their power enough to publish. In addition, there is a considerable selection bias in what gets published, since journals prefer articles with statistically significant findings. A requirement that the multivariate test must also be significant when several relationships with the same variable are examined in order for some of these to be treated as significant would exacerbate this problem.

Speaking more generally, I believe that statistical inference should be based much more on placing confidence intervals around parameter estimates and less on tests of statistical significance per se. Statistical significance tests the stability of the direction of the relationship, but leaves its magnitude in question. Failure to reject the null hypothesis is almost always due to lack of sample size, not to an absence of any relationship in the population. For all these reasons and more, I consider the emphasis currently placed on statistical significance to be dysfunctional in many respects. I see scientific research as the accumulation of generalizable evidence about the degree to which variables are related, not as a series of dichotomous choices as to whether variables are or are not related. I also see it as theory driven, not as simply descriptive and/or operational. (I think of what Cook and Campbell refer to as the "external validity" problem of "conceptual validity" as a kind of internal validity problem, and organize my tests od significance accordingly when I have several measures of the same theoretical variable, much as in structural equation analysis (as in LISREL, for example). This influences my views on the extent to which any given statistical inference practice is likely to be functional (right) or dysfunctional (wrong).

As a consequence of all the forgoing, I tend to agree with your position (at least I think I do), but not necessarily with the logic by which you arrived at it. On the one hand, I would have no strong objection to having a policy that mutivariate tests must always be run and published when a single variable is related to several different variables, so long as lack of significance on the multivariate test were not considered to preclude further analyses for each variable taken separately. On the other hand, I don't think that such tests are very helpful, unless the relationships involve a common issue about which a single conclusion must be reached. (This is similar to what I understand your point of view to be.) This does not mean that the editors who think otherwise are "wrong" in any way that could be proven mathematically, however.

You may be looking for more than the above, since Hector's answer failed to satisfy you and you have presented a rather detailed argument for your own point of view. I will go on, therefore, to a more technical consideration of the points that you raise. (Warning: Listserve readers may find the rest of this memo to be even more confusing and/or nit picking than the above. If you're not really, really interested in this topic, but somehow read this far anyway, STOP.)

Some preliminary matters need to be clarified before proceeding to your main questions. The variables/"attributes" that you refer to (distance to roads, habitat quality, and elevation) are presumably the results of differences in bear age, rather than the consequences of bear age, so one could claim that you are "wrong" to call them "independent" variables. The parenthetical context of your statement ("unrelated means") suggests, however, that you mean "independent" to refer to their relations to one another, rather than to age. Even here, it seems unlikely that they are truly independent in the statistical sense. Can it really be true that these three "attributes" have no correlation whatsoever to one another in the bear population? I suppose it may be remotely possible that their intercorrelations are rather small, but I would certainly have expected habitat quality to be rather substantially correlated with distance to roads and elevation. By "independent", do you mean simply that these are distinct variables, rather than variables which are closely linked to one another by theory or operational overlap? To digress somewhat, one could also argue that these variables are not truly bear "attributes" (as age and gender would be, for example) so that you were "wrong" to refer to them that way.

Your choice of terminology may be standard practice among wildlife researchers, and thus unlikely to mislead in that context. To the extent that your question is statistical, however, the way you are using these terms is potentially confusing (at least to me). This is particularly true with respect to the term "independent". I don't mean to be unduly argumentative; I just want to point out that nearly everything we do or say in research, including the terms that we use, varies in "correctness" according to functional context. You might want to keep that in mind, because it applies also to the issue of which omnibus tests of significance are "right" for the type of research that you do.

Your memo doesn't make it clear whether or not you are starting with some a priori hypotheses about the relationships you are likely to find between your three categories of bear age and the various "independent" variables you cite. Most statisticians believe that there are major differences in the right way(s) to use tests of statistical significance according to whether one is using them to test a priori hypotheses or using them in the absence of any a priori hypotheses in "fishing expeditions" to test which of a number of variables of interest are related, and the form of those relationships. [Perhaps with bears the term "hunting expeditions" would be be preferrable. :-) ]

It is generally considered appropriate to use a priori theory to narrow the variable pairings and patterns of relationships to be investigated, as well as the "null" hypotheses to be considered as potential alternatives. Your discussion makes it clear, for example that you wish to focus on any pattern of differences in attribute means between the three categories into which you collapsed the continuous variable "age". This was presumably done because a priori knowledge suggested that the relationships between age and "attribute" variables of the kind under consideration are likely to be discontinuous and nonmonotonic, and was not influenced by sample characteristics. (If this focus was determined AFTER examining the actual data patterns in your sample, then the meaning of your tests of statistical significance would change dramatically. Indeed, many statisticians would say that none of the tests you discuss would be "right" in that circumstance.)

Your choice of statistical procedures suggests that you are more concerned with the effects of age on "averages" than with its effects on "variability". Once again, I assume that either effects on averages are the ones that have the most clearcut implications for the applications that concern wildlife researchers, or that the relationship of age to the variablity of these attributes is known to be small. Whether or not that is so, however, you should be aware that you are at least implicitly using a priori considerations to focus your inquiry.

If you had gone a step or two further, and proposed a priori hypotheses about how each attribute was expected to be related to juvenile/mature/old age, then you could have used planned contrasts specific to the a priori patterns of differences in means that you had predicted. This would have increased power if you got the predicted pattern, but decreased it to near zero if a different pattern emerged. Since you did not do so, however, I think that the common practice of requiring a statistically significant omnibus test (the F test from a one way ANOVA) and/or using the Scheffe test or its equivalent to compare between group means is usually functional. I don't see even that as necessarily a given, however.

Suppose you found that you unexpectedly had only one or two bears in one of the three categories, and ample sample sizes in the other two, for example. (I don't suppose that this is likely in your research, but it happens all the time in mine.) I would not think it "wrong" just to test for a difference in means between the two categories for which the sample sizes were large enough to give you enough power to stand a good chance of getting significance. I would NOT insist on your using all three groups, with a consequent loss of overall power, just because that was what you had originally planned to do when you thought that the N's would be approximately equal.

In the above circumstance, I would (in my journal editor incarnation) suggest that you report the means for the third group, but comment that small sample sizes had precluded their inclusion in significance testing, and warn that their means were too unstable due to small sample size to warrant meaningful comparisons. (Note: This should be done BEFORE determining that excluding the third group would actually increase the level of statistical significance. In other words, this strategy should be declared, and hypotheses adjusted accordingly, just as soon as it becomes clear that there aren't going to be enough cases in one of the groups. It would be inappropriate to change back if a one way ANOVA with all three categories would have been statistically significant, whereas testing just the two large N categories against one another was not.) Would a wildlife editor agree with the above? My guess is that some would, and some would not. Who would be "right"? Well I could justify the above strategy as more functional than not, but I am reasonably sure that some statisticians would say that I am flat out wrong.

I know that I have wandered away from the question you asked, but I have deliberately broadened my answer to include other similar situations in which the form and/or number of the test(s) to be used can in some manner raise the issue of lumping multiple test and/or comparisons under a single test, in order to controll the overall error rate, rather than just considering each kind of error seperately. The point that I am trying to make is that the kind of question you are raising comes up in multiple contexts. Even in contexts where the standard practice is to start with an omnibus test, or otherwise take the overall error rate into account, treating each issue/comparison separately would not be demonstrably WRONG.

To get back to your specific question, if your research had a priori hypotheses, and a theoretical basis, the issue of whether or not the same theoretical/general proposition underlies all three hypotheses might then be raised. If all three analyses related to a single underlying more general hypothesis, and if you had no a priori reason to believe any of the analyses in question to be more accurate in testing that hypothesis than any other, then it would (in my opinion) be correct to use a multivariate test for the purpose of testing the theoretical/general proposition in question. In testing several operational hypotheses which all relate to a single underlying general (theoretical) hypothesis, multivariate analyses seem a logical way of testing the underlying general hypothesis, about which a single conclusion needs to be reached.

Your message makes it clear, however, that you do not see any underlying general proposition to which all three variables relate. In my opinion, it would therefore be correct to run three separate tests for the three separate hypotheses without bothering to run a multivariate test. In testing operational hypotheses which are NOT interrelated, then multivariate tests do not serve any useful theoretical function, since the truth or falsity of each hypothesis involves a separate and distinct issue. I would not, therefore insist on a multivariate test if I were a wildlife journal editor. That is not because it would be demonstrably WRONG to do so in any mathematical sense, but because I consider the "multivariate tests first, regardless" approach to be dysfunctional to the purpose of fostering cumulative scientific inquiry. There are those who would disagree with that, however. Some might even think that much of what I have said above is "wrong", because it underestimates their concept of THE error rate against which researchers MUST protect themselves. In my opinion, the best approach is to CHOOSE which kind(s) of chance errors you need to protect against for a given PURPOSE, then run the test that does exactly what you want, no more and no less.

To end where I began, there is no clearcut right or wrong answer to the question you raise, or to issues of functionality in general. They all depend on context and criteria.

*************************************************************************** Powhatan J. Wooldridge, Assoc. Professor, Nursing, State Univ. NY at Buffalo


Back to: Top of message | Previous page | Main SPSSX-L page