LIR 493:
Quantitative Methods
Professor Wallace Hendricks

ANOVA - Analysis of Variance (Chapter 15)

Analysis of variance (ANOVA) is a hypothesis-testing procedure used to determine if mean differences exist for two or more treatments (to population s). For an independent -measures experiment, a separate sample is taken for each treatment condition (e.g., people are randomly assigned to jobs of different difficulties). The purpose of ANOVA is to decide whether the differences between the samples are simply due to random error (sampling errors) or whether there are systematic treatment effects that have caused scores in one group to differ from scores in another. The alternatives can be stated as follows:

1. The population s for all treatments are really the same ('s are equal); differences in means occur by chance.

2. At least one population for the treatments is different than the others (at least one is different than the others). This implies that the mean difference between the samples is due to the treatment.

The hypothesis test for ANOVA will attempt to differentiate between these two alternatives by computing a test statistic that is very similar to the t statistic used before. For the t statistic, we computed a value for t as follows:


For ANOVA, the test statistics is called an F-ratio and has the following structure:

Notice that the F-ratio is based on variance instead of sample mean differences!

Statistical Hypotheses for ANOVA

Suppose that you offer 4 different versions of the same exam to minimize cheating on the exam. Four samples of subjects are selected, one sample for each treatment (test) condition. The purpose of the study is to determine if the exams really have the same average scores. In statistical terms, we want to decide between two hypotheses: the null hypothesis (H0), which says that there is no difference between the mean s, and the alternative hypothesis (Ha), which states that at least one of the exam averages is different than the others. In symbols:

H0: 1=2=3=4

Ha: At least one population mean is different than the others.

Notice that the hypotheses are always stated in terms of population parameter s, even though we use sample data to test them.

If we were to try to test for difference between the mean s using a t-test, we would be required to perform six different t-tests. Somehow, we would be required to reject the null hypothesis based on six different test results, (1 vs 3, 2 vs 4 etc). How could we do this?

The Logic of Analysis of Variance

Suppose that we gathered information on 17 subjects randomly assigned to the four tests as in the included handout. The mean s on the four tests () are 22, 23, 31 and 26. Are these really different, or is the difference due to random chance because of small sample sizes (5,4,3 and 5 subjects)?

To answer this question, we need to look at the variability for the data. There are several ways that we can approach an ANALYSIS of this variability.

1. Compute the variance of all the data combined. That is, we can ignore the fact that the test scores came from different tests and simply compute the total variance of the data.

2. Compute the variance for each test separately. The subjects who took the first test averaged 22 on the test with a high of 25 and a low of 18. We can compute the variance of these scores (and do the same thing for all four exams). We can then combine these four estimated variances together to form a single estimate of the variance WITHIN the treatment conditions. (How could we do this?) This estimate is sometimes called the within-treatments variance, the unexplained variance, the residual variance or the error variance.

3. Compute the variance of the sample means. The samples range from 31 to 22. This difference is a measure of the overall difference between the tests. The three people who took test three all scored higher than all the people who took test one. How could we use the variance of the sample means to estimate the common variance of the subject test scores? (e.g. what is the relationship between the variance of sample means and the population variance ?) This estimate of the variance is sometimes called the between-treatments variance or the explained variance.

We will compute each of these variances in class. Suppose that we focus on the second and third estimates given above. Why should they be different? There are three basic reasons why the test scores should be different for different individuals:

1. The tests that they took really had different levels of difficulty (a treatment effect).

2. The individuals have different skills. (individual differences)

3. The exams are perfect measures of skills so there will be experimental error.

The total variability can be attributed either to between-treatments variability or within-treatments variability. Both are influenced by individual differences and experimental error. The between-treatments variability has one more component however. If the means of the tests are really different then the treatment effect will add to this variability! Thus, we can see if there is a difference in means by comparing the between-treatments and the within-treatments estimates!

The F-Statistic: Comparing Between and Within Estimates of variances

It turns out that the best way to compare these two estimates is to take their ratio. For independent -measures ANOVA, the F-ratio has the following structure:


Expressed in terms of sources of variability, this becomes


when Ho is true. What should the expected value of the F-statistic be in this case? (the expected value of the t-statistic was zero when the null hypothesis was true). If the treatment effect is important, what values should we get for the F-statistic? Is the F-test going to be one or two-tailed?

Definitions: In ANOVA, independent variables (treatments) are referred to as factors. Because we only had one independent variable in this example (test type), it is called a single factor experiment. The denominator of the F-statistic is often called the error term.