Date: Thu, 10 Sep 1998 19:26:55 -0400
Reply-To: "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Sender: "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From: "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Subject: Re: procedure for experimental design (fwd)
Content-Type: TEXT/PLAIN; charset=US-ASCII
The following off list exchange concerns my previous reply on the list to
a posting by Scott Dixon. (These are repeated at the very bottom, in case
you've forgotten them.) I'm forwarding the off list exchange to you with
the concurrence of Mr. Vershell and Professor Ware (whose separate
response to Mr. Dixon's original message is cited).
This exchange (especially my somewhat edited and expanded reply) may be
more lengthy and detailed than you would care to read. On the other hand,
it occured to me that there might be some general interest in an extended
discussion of correlated sources of error, changing the unit of analysis,
the assumption of independence, experimental design, and choice of data
analysis technique. While aspects of these interrelated topics have
previously been discussed on the list separately, I haven't seen a really
comprehensive discussion anywhere. My advanced quantitative methods class
always seems to have a hard time with this, so maybe you can use all or
part of the following in your teaching even if you already know it all
yourself.
Delete or read on as you prefer.
***************************************************************************
Powhatan J. Wooldridge, Assoc. Professor, Nursing, State Univ. NY at
Buffalo
---------- Forwarded message ----------
Date: Tue, 8 Sep 1998 16:49:55 -0400 (EDT)
From: "Powhatan J. Wooldridge, Ph.D." <pjw@acsu.buffalo.edu>
To: Mark Verschell <MVERSCHELL@worldnet.att.net>
Subject: Re: Re: procedure for experimental design
On Mon, 7 Sep 1998, Mark Verschell wrote:
>
> Dr. Wooldridge:
>
> For those of us who are still learning the intricacies of correlated sources
> of error, could you explain your reasoning further with regard to Scott
> Dixon's research design. With regard to error sources not varying in
> random fashion between each student, I'm not sure I understand your
> reasoning. Certainly, the performance of students within each group would
> be expected to be correlated, but does this qualify as a source of error
> variance ? Unless the students were copying each others' test papers,
> wouldn't the error variance associated with each student's test score be
> random in nature ? Even if the teacher had given one whole class a mistaken
> piece of information, wouldn't this correlated component of test scores
> reflect the true extent of each individual student's knowledge within that
> class (i.e. it is a component of true score variance, not variance
> attributable to measurement error) ? With regard to scores for one student
> influencing the scores for other students, isn't this only a concern when
> the "influence" is between groups, rather than within groups (ie. the
> research question already assumes that student's scores are correlated
> within groups because they are subjected to the same teaching method by the
> same teacher) ? Aloha, and thanks for lending your expertise.
>
> Mark Scott Verschell
> American School of Professional Psychology
> 1427 A Dominis Street
> Honolulu, Hawaii 96822
> mverschell@worldnet.att.net
Thank you for your thoughtful comments, Mark. You are entirely right to
question anything that doesn't make complete sense to you. Please note
that Professor Ware's response also cited lack of independence as a major
problem in this design. That doesn't mean that I was right, of course. It
just means that the point of view that I presented was not an unusual one.
Let me see if I can explain why lack of independence is a problem clearly
enough to convince you. I'll start by conceptualizing what I mean by error
variance in an experimental trial. In that context, true variance is
variance due to what one is trying to test (in the study Dixon described,
that would be variance due to differences between the two different ways
of teaching), and error variance is variation in the posttest scores that
is due to anything else.
By this definition, if the instructor accidentally gave one class (but not
the other) an erroneous piece of information FOR REASONS THAT HAD NOTHING
TO DO WITH THE TEACHING METHOD, that would indeed be a source of error
variation FOR PURPOSES OF TESTING THE RESEARCH HYPOTHESIS. To give another
example, suppose one class met at a time of day when the students (and/or)
the professor were too tired to learn (or teach) well. This also would
violate the assumption that error sources are random. So would having to
cancel one or more classes for one group (but not the other) due to a
snowstorm. (Not a big problem where you live, but sometimes a factor here
in Buffalo.) Note that even if the uviversity stayed open, there would be
a "correlated error source" that would violate the assumption of
independence if multiple members of one class stayed away because of a
snowstorm that did not affect the other class.
The problem with your line of thought may be that you are thinking of
error variation vs true variation solely in terms of error of measurement.
Remember that in this context true variation is variation in posttest
scores due to differences in the teaching technique employed, and error
variation is everything else. Error in measuring the true extent of the
student's knowledge is one source of error variation in Dixon's
experimental trial, but it is far from the only one, or even the most
important one.
Pre-existing differences in the knowledge of the students is, for example,
potentially a major source of error variance in their post test scores.
(This is NOT taken into account simply by calculating change scores, since
low pretest scores will then tend to lead to higher than average change
scores, and high to lower, due to "regression to the mean".) If the
experimental group scored higher on the posttest (but lower on change
scores) because they started out knowing more, that would not be due to
measurement error, but it would not be due to a difference in the relative
effectiveness of the teaching techniques, either. In other words, this
would lead to confounded error variation in a posttest only comparison OR
in a change score comparison (although the direction of the bias would
vary). That is why both I and Professor Ware thought it important to use
the pretest score as a covariate. (Note: While this helps reduce the
threat that differences between the outcomes for the two groups might be
due to pre-existing differences between the subjects, it is not really an
adequate substitute for random assignment of students to groups, as was
noted in some of the responses to Dixon's message.)
Let's apply this same line of thought to your assertion that "the research
question already assumes that student's scores are correlated within
groups because they are subjected to the same teaching method by the same
teacher". Everyone in BOTH groups is taught by the same teacher, so that
isn't a source of correlation between groups OR within groups. ("Teacher"
is a source of within subject variation only in the sense that the pretest
scores are before exposure to the teacher, and the posttest scores are
after exposure to the teacher; but "Teacher" itself is a constant over ALL
subjects, and thus not a source of variation or correlation.)
Teaching method is not constant over all subjects. Still, it is constant
within each group, so I think that your "correlation within groups"
comment is technically a misstatement. I think I know what you mean,
however. The expectation that those exposed to one teaching method will
tend to have different scores than those exposed to the other method is
part of the research (or alternative) hypothesis; the null hypothesis is
that this will NOT affect scores. Under the null hypothesis, the posttest
scores of students will be randomly distributed, without regard to
treatment group. The problem of "correlated error" is that this
statistical expectation may be untrue for reasons NOT associated with
differential effectiveness of the two teaching techniques.
I have already illustrated how confounded error due to extraneous
variables influencing multiple subjects in a similar manner could lead to
statistically significant differences between the groups. To illustrate
the additional problems that may occur if what one student learns affects
what other students learn, suppose that the students in both classes
organize study groups. In one class, there is a brilliant student who not
only learns the material exceptionally well, but is capable of teaching it
to the other students well. This would violate the assumption that the
knowledge learned by one student does not affect what is learned by the
other students. Equally clearly, it could lead to a statistically
significant difference between treatment groups for reasons that have
nothing to do with differences in the relative efficacy of the two
teaching methods.
The problem is that the model on which the statistical test is based
assumes that the error sources that affect one subject (more properly,
"unit" or "score") do not simultaneously affect, or spread to, other
scores in the same group. If they do, then the sample size is less of a
protection than it would be if each error occurence affected only a single
score. As you seem to recognize, the assumption of independence is almost
always violated when "group" means what a sociologist would refer to as a
group. In order to use these statistical tests validly (with subject as
the unit of analysis), the experimental and treatment groups should be
groups only in the sense that they are grouped together by the
statistician for purposes of analysis. (What a sociologist would refer to
as an "aggregation", and not a group at all!)
If your response is that the teacher should therefor try to discourage
study groups if using this design (with student as the unit of analysis),
then I might agree, assuming that this violates no ethical or theoretical
precept. On the other hand, suppose our brilliant student teaches the
others through the class discussions? In order to have a minimal violation
of independence, then, perhaps the classes should be taught by videotape,
under circumstances that preclude (or at least minimize) interaction among
the students. That would help, if it were feasible. On the other hand, I
have a hunch that at least one of the teaching methods would preclude this
approach.
How can you get away from this problem, then, you ask? The answer is
simple, but also time consuming. Use class as the unit of analysis for
testing the hypothesis. (Of course, this would require more than just two
classes.) As the implications of your statements about assumptions of lack
of independence of results between students should make clear, class is
the appropriate unit of analysis for testing this hypothesis. What happens
in one experimental class is unlikely to affect what happens in other
experimental classes, and it is much more feasible to use random
assignment at the class unit of analysis than at the student unit of
analysis. Indeed, none of the above examples of correlated error
would violate the assumptions of the test in that design!
But what about the fact that each pair of classes is taught at a different
callendar time (Semester)? Doesn't that violate the assumption of
independence? Mightn't the frequency of snowstorms vary with Semester, for
example? The answer is "Yes, maybe; but even if it does that won't lead to
a spuriously significant result."
If "Semester" turns out to affect gain scores, this variable will have
been counterbalanced (its effects will have been equated by design) if you
run the same number of experimental classes as control classes each
semester. This violation of the assumption of independence would NOT,
therefore, lead to spuriously significant results, even if you ignored it
in the data analysis. If you believe it to be an important factor,
however, then you could treat each semester's results for two classes as a
matched pair by subtracting the control class score from the experimental
class score, and then doing a one sample test on the difference scores.
Once again, you will have preserved independence by changing the unit of
analysis (at the expense of having only one half as many degrees of
freedom, however).
Note that my advice was NOT to do this, if the number of classes was very
small. The test would be at least as valid if done as a matched pairs
design, but it would be less powerful if semester had little or no effect
on the scores. As a compromise, semester might be used as a separate
factor or block in the design, which would introduce some control, while
preserving degrees of freedom. (You'd lose just one degree of freedom if
the semesterXtreatment interaction were left in the error term, which it
almost certainly should be.) Indeed, the most elegant way of sorting out
error sources from true effects would be to combine both units of analysis
in a hierarchical linear model. The effects of variation in pretest scores
could then be controlled at the subject level of analysis, and the effects
of semester, if any, could be controlled at the class unit of analysis.
This could all be done automatically by any good HLM program. Averaging
regression adjusted gain scores for each class, then doing the analysis at
the class level while ignoring the effects of semester (the proceedure I
recommended) would be almost as good, however; and it was a lot easier to
explain. Most HLM software is difficult to interpret and use correctly, so
I don't recommend it to anyone who isn't an experienced data analyst.
If all this seems unbearably complicated to you, break it down paragraph
by paragraph; and consult your friendly neighborhood statistician about
the parts that still seem puzzling. Also take a look at my original answer
to Dixon's query, and that of Professor Ware. If you start to get the
feeling that you now understand the "why's" behind our recommendations,
then you will know that you're on the right track.
Best of luck,
Pow
PS It occurs to me that others who look at the discussion list might be
interested in our exchange. Do I have your permission to forward it to the
SPSS listserve? If so, would you prefer that I delete all personal
references that would identify you, or not?
***************************************************************************
Powhatan J. Wooldridge, Assoc. Professor, Nursing, State Univ. NY at
Buffalo
> -----Original Message-----
> From: Powhatan J. Wooldridge, Ph.D. <pjw@ACSU.BUFFALO.EDU>
> Newsgroups: bit.listserv.spssx-l
> To: SPSSX-L@UGA.CC.UGA.EDU <SPSSX-L@UGA.CC.UGA.EDU>
> Date: Monday, September 07, 1998 10:10 AM
> Subject: Re: procedure for experimental design
>
>
> >If the same test was given pre and post, then most would calculate change
> >scores for the test, and subject these change scores to an independent
> >samples t test. This is unlikely to be optimum, however, even if the same
> >questions were asked pre and post. Pretest scores in the situation you
> >describe sometimes have low reliability/validity (since most students are
> >"guessing" about the right answers, due to low initial knowledge). The
> >posttest scores tend to have much higher reliability/validity (since
> >students are presumably much more informed about the issues tested after
> >taking the course). In such situations, it can be shown that it is better
> >in most respects to ignore the pretest and to compare posttest scores than
> >to compare change scores. This fails to "take into account" possible
> >differences between the groups in their average pretest scores, however;
> >and clearly this is not what the dissertation committee had in mind. (They
> >would hardly have had him do a pretest if it were.)
> >
> >The best procedure in this circumstance is probably to control for the
> >before score as a covariate, using posttest scores as the dependent
> >variable (or change scores, if you prefer; the treatment effects after
> >controlling for the covariate will be identical). The ANOVA subroutine in
> >SPSS is probably the easiest to use for this purpose, although you could
> >get the same result with the MANOVA program if you set things up just
> >right. (If you use the MANOVA program, do NOT treat this as a repeated
> >measures situation. Repeated measures ANOVA is NOT the same as treating
> >the pretest score as a covariate. Better yet, just use the ANOVA program
> >instead, as recommended.)
> >
> >There are some technical problems in trying to adjust for pre-existing
> >differences between the students who make up the two treatment groups by
> >using baseline scores as a covariate, but in my opininion this is the best
> >procedure for the situation you describe. There is a rather serious
> >problem that your collegue should at least acknowledge, however. The use
> >of groups that meet as intact classes tends to violate the assumption that
> >error sources vary randomly from student to student, and that the results
> >for one student do not influence the results for other students. There's
> >not much that your colleague can do about that problem (other than to
> >acknowledge it in his dissertation writeup), unless he is willing to keep
> >running the experiment until there are several classes available for
> >analysis.
> >
> >You might want to advise your colleague that the valid way to test his
> >hypothesis would be to replicate for a few semesters, with a random choice
> >of which CLASS to give which instructional method. He could then use class
> >as the unit of analysis (average posttest scores, or average regression
> >adjusted posttest scores, for each class) to test the hypothesis If the
> >treatment effect is large, I would expect it to be significant with a very
> >small N of classes -- perhaps as little as 3 or 4 classes of each kind. If
> >that would delay his graduation too much, then perhaps he could do the
> >additional replications after graduation, but before publication. (I would
> >suggest ignoring the semester by semester pairing, by the way; since I
> >wouldn't expect "semester" to have a strong influence on the scores. One
> >needs to retain as many degrees of freedom as possible with tiny sample
> >sizes.)
> >
> >***************************************************************************
> >Powhatan J. Wooldridge, Assoc. Professor, Nursing, State Univ. NY at
> Buffalo
> >
> >
> >On Mon, 7 Sep 1998, Scott Dixon wrote:
> >
> >> A colleague asked for my help with his thesis project. He performed a
> >> simple experiment. He pretested two classes using the same test. Then
> he
> >> gave one class a special set of lectures, and gave the other class the
> >> lectures he always uses. Finally, he gave both classes the same
> posttest.
> >> He wants to see if the special lectures had an effect on the posttest of
> >> the experimental group compared with the posttest of the control group.
> I
> >> am having trouble coming up with the proper SPSS procecure.
> >>
> >>
> >> The design is:
> >>
> >> Group Observation-1 Stimulus Observation-2
> >>
> >> experimental class pretest new-lectures posttest
> >>
> >> control class pretest old-lectures posttest
> >>
> >>
> >>
> >> The two groups were not randomly assigned, nor matched along any
> variable.
> >> However, the groups were not allowed to pre-select themselves into either
> >> class. The class contents were similar in every way except the set of
> new
> >> lectures. The experimental group did not know that they were being
> exposed
> >> to new lectures.
> >>
> >> I am embarrassed to say that I cannot figure out what SPSS procedure to
> >> perform. The closest I can get is "Independent-Samples T Test". Here, I
> >> used the continuous variables "Pretest" and "Posttest" by "Classes
> [groups
> >> 1,2]". But it doesn't seem to say if the improvement between pretest and
> >> posttest is significantly greater between the two groups. Is that
> >> something I have to do? Or is there a procedure that calculates that for
> me?
> >>
> >> I would be grateful if someone could point me in the right direction. I
> >> will gladly pour over the books if someone would give me a push in the
> >> general direction.
> >>
> >> Thank you very much.
> >>
> >> Scott Dixon
> >> dixon_s@mail.coc.cc.ca.us
> >>
>
>
|