An Introduction to Data Analysis & Presentation Prof. Timothy Shortell, Sociology, Brooklyn College Comparing Means: Analysis of Variance (ANOVA)
The two sample t-test allows us to compare
We could just do a t-test for each pair. This would be three t-tests. The problem with this strategy has to do with the risk of a type I error. When we set alpha to 0.05, we want the risk of this kind of error to be no more than 5%. But, if we do three t-tests, the total probability of a As you can anticipate, this becomes more of a problem with more groups. If we wanted to compare four groups, we would need 6 t-tests -- a 30% chance of type I error. If we had five groups, we would need 10 tests -- now, the likelihood of a type I error is 50%! We need a way to compare more than two groups that does not inflate the likelihood of a type I error. This is called analysis of variance or, ANOVA. Instead of calculating t-scores, we will calculate an F-score. (We need a new kind of score because we are using a different sampling distribution. The t-score was based on the t-distributions, and the F-score is based on the F-distributions.)
The t-score is a ratio of variation between groups to that within groups. The F-score is the same kind of ratio. With the F-score, the number of groups is more than two, so the formula is a more generalized way of comparing between groups variation to within groups variation.
We can measure variation with the concept of sum of squares. This is the sum of squares within groups. It measures the amount that scores in a group tend to differ from one another. If we are examining the ideology scores of working, middle and upper classes, this would represent the amount that working class respondents differ from the working class mean, plus the amount middle class respondents differ from the middle class mean, plus the amount upper class respondents differ from the upper class mean.
We can also calculate the sum of squares between the groups. This is the amount that the working class mean differs from the grand mean (the overall mean, taking all respondents together), plus the amount that the middle class mean differs from the grand mean, plus the amount the upper class mean differs from the grand mean.
The F-score, then, is: Imagine, for a moment, what these two totals would be like if there was a lot of variation in ideology scores within class groups, but very little between classes. This would be the case if class had no relationship to ideology; in other words, members of the lower or working classes are no more liberal than members of the middle or upper classes, and so forth. In this case, the F-score would be small -- close to zero. Now, imagine that there is a lot of between groups variation, and little within groups variation. This would be the case if the group means for the classes were very different but members within each class were very similar. In other words, this would be the case if class accounted for almost all of the variation in ideology. In this case, the F-score would be large. Let's see how this works out in an example.
First, we set our From the 1996 GSS, we generate the following results: The F-score is 2.73, with a probability of 0.043. Just as with the t-test, we compare the probability with the standard criterion, 0.05. Since the probability of our F-score is less than the criterion, we reject the null hypothesis. We conclude that the mean ideology scores of the classes are not all equal to one another. We can see from the table of group means, that there appears to be some differences. At this point, we only know that there is a significant difference
In order to identify which comparisons are statistically significant -- remember, the research hypothesis states only that There are many different kinds of post hoc tests. They all do the same thing: compare sample means in such a way as to not increase the likelihood of a type I error (as doing t-tests on all the pairs of means would). We will look at a post hoc test called Tukey's HSD. We won't worry about how the test is calculated. Instead, we will work on interpreting the results.
Here is SPSS output displaying the HSD test: We want to know, at a 95% confidence level, which group means are different. The post-hoc test results are read just like t-test results. We compare the significance figure for each comparison with the 0.05. Any comparison that shows a probability less than or equal to 0.05 is considered a statistically significant difference.
We interpret the In our data, the difference between the lower class and the upper class is large enough to be reliable, but all other differences are not. The means may appear to be different, but we cannot be sufficiently confident that the apparent differences reflect the true state of the social world.
The post-hoc test only indicates that a particular comparison is statistically significant -- that it is reliable. We need to assess whether it is
1. Formulate a hypothesis. 2. Interpret the following results: 3. Assess the sociological significance, if any, of the results. How would you explain the results? All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials. |