|
Quantitative Methods Professor Wallace Hendricks |
|
Analysis of variance (ANOVA) is a hypothesis-testing
procedure used to determine if mean
differences exist for two or more treatments (to population
s).
For an independent
-measures
experiment, a separate sample
is
taken for each treatment condition (e.g., people are randomly
assigned to jobs of different difficulties). The purpose of ANOVA
is to decide whether the differences between the samples are simply
due to random error (sampling errors) or whether there are systematic
treatment effects that have caused scores in one group to differ
from scores in another. The alternatives can be stated as follows:
1. The population
s
for all treatments are really the same (
's
are equal); differences in means
occur by chance.
2. At least one population
for the treatments is different than the others (at least one
is different than the others). This implies that the mean
difference between the samples is due to the treatment.
The hypothesis test for ANOVA will attempt to differentiate between these two alternatives by computing a test statistic that is very similar to the t statistic used before. For the t statistic, we computed a value for t as follows:

For ANOVA, the test statistics is called an F-ratio
and has the following structure:
Notice that the F-ratio is based on variance
instead of sample
mean
differences!
Statistical Hypotheses for ANOVA
Suppose that you offer 4 different versions of the
same exam to minimize cheating on the exam. Four samples of subjects
are selected, one sample
for
each treatment (test) condition. The purpose of the study is to
determine if the exams really have the same average scores. In
statistical terms, we want to decide between two hypotheses: the
null hypothesis
(H0), which says that there is no difference between
the mean
s,
and the alternative hypothesis (Ha), which states that
at least one of the exam averages is different than the others.
In symbols:
H0:
1=
2=
3=
4
Ha: At least one population
mean
is different than the others.
Notice that the hypotheses are always stated in terms
of population
parameter
s,
even though we use sample
data
to test them.
If we were to try to test for difference between
the mean
s
using a t-test, we would be required to perform six different
t-tests. Somehow, we would be required to reject the null hypothesis
based on six different test results, (
1
vs
3,
2
vs
4 etc). How could we do this?
The Logic of Analysis of Variance
Suppose that we gathered information on 17 subjects
randomly assigned to the four tests as in the included handout.
The mean
s
on the four tests (
) are 22, 23, 31 and
26. Are these really different, or is the difference due to random
chance because of small sample
sizes
(5,4,3 and 5 subjects)?
To answer this question, we need to look at the variability
for the data. There are several ways that we can approach an ANALYSIS
of this variability.
1. Compute the variance
of all the data combined. That is, we can ignore the fact that
the test scores came from different tests and simply compute the
total variance
of the data.
2. Compute the variance
for each test separately. The subjects who took the first test
averaged 22 on the test with a high of 25 and a low of 18. We
can compute the variance
of these scores (and do the same thing for all four exams). We
can then combine these four estimated variances together to form
a single estimate of the variance
WITHIN the treatment conditions. (How could we do this?) This
estimate is sometimes called the within-treatments variance,
the unexplained variance, the residual variance or
the error variance.
3. Compute the variance
of the sample means. The samples range from 31 to 22. This difference
is a measure of the overall difference between the tests. The
three people who took test three all scored higher than all the
people who took test one. How could we use the variance
of the sample means to estimate the common variance
of the subject test scores? (e.g. what is the relationship between
the variance
of sample means and the population
variance
?)
This estimate of the variance
is sometimes called the between-treatments variance or
the explained variance.
We will compute each of these variances in class.
Suppose that we focus on the second and third estimates given
above. Why should they be different? There are three basic reasons
why the test scores should be different for different individuals:
1. The tests that they took really had different levels of difficulty (a treatment effect).
2. The individuals have different skills. (individual differences)
3. The exams are perfect measures of skills so there
will be experimental error.
The total variability can be attributed either to
between-treatments variability or within-treatments
variability. Both are influenced by individual differences
and experimental error. The between-treatments variability
has one more component however. If the means
of the tests are really different then the treatment effect
will add to this variability! Thus, we can see if there is a difference
in means
by comparing the between-treatments and the within-treatments
estimates!
The F-Statistic: Comparing Between and Within
Estimates of variances
It turns out that the best way to compare these two
estimates is to take their ratio. For independent
-measures
ANOVA, the F-ratio has the following structure:
Expressed in terms of sources of variability, this
becomes
when Ho is true. What should the expected value
of
the F-statistic be in this case? (the expected value
of the t-statistic was zero when the null hypothesis
was true). If the treatment effect is important, what values should
we get for the F-statistic? Is the F-test going to be one or two-tailed?
Definitions: In ANOVA, independent
variables (treatments) are referred to as factors. Because
we only had one independent
variable in this example (test type), it is called a single
factor experiment. The denominator of the F-statistic is often
called the error term.
|