An Introduction to Data Analysis & Presentation

Prof. Timothy Shortell, Sociology, Brooklyn College

The T-Test

With the confidence interval, we are able, for the first time, to say something about the population based on information in our sample. We use the confidence interval to estimate the population mean.

Often, we have ideas about the population mean before we collect data on our sample. We may have read other studies on the same topic, and they suggest that they population mean will be a particular value. Or, we may have an idea about the population mean based on the internal logic of the measurements we are using. Sometimes, we base our expectations on our experience of the particular social phenomena we are studying.

We need to formalize this testing of our expectations against our sample data. This process is called hypothesis testing. Each new statistic that we learn will be a tool for testing a different kind of hypothesis.

We begin with some hypotheses about the mean.

The Logic of Hypothesis Testing
At first, our approach to hypothesis testing will seem backwards. We will set up a certain hypothesis -- the null hypothesis -- and try to demonstrate that it is probably wrong, based on our sample data.

Why not just try to prove the hypothesis based on our expectations?

As it turns out, this is a difficult task. It is easier to use probability to show that the null hypothesis is probably wrong.

The null hypothesis always states that there is no effect. In contrast, the research hypothesis states that there is an effect -- that is, our expectations about the population mean, for example.

The null and research hypotheses are always defined as logical opposites. They are mutually exclusive of one another -- only one or the other can be true, not both.

Let's consider a specific problem. The CEO of VeryBig.Com Corporation testifies before the Congressional Subcommittee on Labor that the average hourly wage of his employees is an astounding $12.75. The Subcommittee has hired us, BrooklynSoc Consulting, to determine whether or not the CEO is telling a VeryBig lie.

Since VeryBig.Com has hundreds of thousands of employees, we can't check on the accuracy of the claim by calculating the population mean. Instead, we decide to collect wage data from a random sample of employees. We can then test the hypothesis about the mean wage. As Marxist sociologists, we don't believe the CEO, and our expectation is that the actual population mean is smaller.

We would set up a null and research hypothesis. Our research hypothesis is that the mean is less than $12.75. The null hypothesis, then, must be that the mean is greater than or equal to $12.75.

We can express this symbolically.

We can calculate a t-score and test these hypotheses. The data from our sample will lend support to one or the other. By convention, we interpret our data from the point of view of the null hypothesis -- that is, our data either supports or fails to support the null hypothesis.

Let's think, for a moment, about the logic of hypothesis testing. When we make a decision about the null hypothesis, our decision is either correct or incorrect. If we were able to know reality directly, we could determine if the hypothesis is actually true or false.

Figure 1. The Logic of Inference

From Levin and Fox, Elementary Statistics in Social Research, 7th edition, 1997.

When we do hypothesis testing, we try to balance the risk of a type I error with the desire to correctly discover a real effect. Science, as a social practice, is conservative in this regard. We tend to favor a rather strict criterion -- typically 95% certainty. Thus, we are relatively more likely to miss real effects than to mistakenly claim that there is an effect when, in fact, there isn't.

The steps of hypothesis testing can be summarized:

  1. State the research and null hypotheses;
  2. set an alpha level (ie, a level of confidence in the claim of an effect);
  3. calculate a significance test;
  4. and, interpret the results.

Testing a Hypothesis about One Sample Mean
Let's return to our example about the average wage of VeryBig.Com. Remember, the CEO claims that the population mean is $12.75hr. Let's test a hypothesis about this.

Our research hypothesis, again, is that the mean is less than $12.75. The null hypothesisis that the mean is greater than or equal to $12.75.

We collect wage data for 125 employees and discover that the average hourly wage is $9.98, with a standard deviation of $6.83. How likely is it that we could get a sample mean of $9.98 if the true population mean is $12.75?

We decide that 95% certainty is a strict enough criterion for this hypothesis -- this is the conventional level.

The statistical test we need is called the t-test. We calculate a t-score for the sample mean, and consult the t-table to determine whether the sample data supports the research or null hypothesis.




When we plug in the numbers and do a little math. Thus, our result.

This tells us that our sample mean is more than four standard errors below the hypothesized population mean. How unlikely is this?

To determine this, we need to consult the t table.

We find that the critical t-score is 1.658.

Since our sample t-score is greater than the critical t-score -- that is, smaller in terms of magnitude, or, absolute value -- we reject the null hypothesis. The likelihood that we could get a sample mean of $9.98hr if the population mean is $12.75hr is less than 5% -- indeed, much less. This is sufficiently unlikely an event that we can conclude that the sample data supports the research hypothesis.

We can't go so far as to say that the data prove the research hypothesis. Therefore, we can't tell the Labor Subcommittee that the CEO is definitely a VeryBig liar, but we can say that the data suggest that the true population mean hourly wage is less than $12.75. (We might also suggest that the CEO be sent to prison for lying to Congress.)

More Examples
1. There has been a lot of talk about the number of home runs hit in Major League Baseball recently. According to our sports almanac, the average number of home runs hit by an Major League player in a typical year is 19.8.

We take a random sample of 100 players and find that the mean is 24.9, with a standard deviation of 16.2.

Perform a hypothesis test to determine the likelihood of a sample mean of 24.9 if the true population mean is 19.8. State the null and research hypotheses; set alpha; find the critical t-score; calculate the sample t-score; make a decision about the hypotheses and interpret the results.

2. Common wisdom has it that the world is becoming more civilized. This would suggest, perhaps, that nation-states are killing fewer of their citizens than was once the case.

We consult our history textbook to learn that, in a typical year in the nineteenth century, the mean number of citizens killed by the typical nation-state -- including capital punishment, deaths in civil and international wars and all state-sponsored violence -- was 2,104.25.

We take a random sample of 75 countries and consult the archives of Amnesty International, The International Court of Justice, the United Nations and the World Bank, and determine that the sample mean is 3,292.8 with a standard deviation of 4,814.5.

Perform a hypothesis test to determine the likelihood of a sample mean of 3,292.8 if the true population mean is 2,104.25. State the null and research hypotheses; set alpha; find the critical t-score; calculate the sample t-score; make a decision about the hypotheses and interpret the results.

Hypotheses about a Difference between Sample Means
The t-test for one sample mean allows us to estimate the population mean on some variable, and to test a hypothesis about it. Often, we want to compare means for two subgroups. When we look at ideology, for example, we often want to know if there is a gender effect -- that is, is the mean ideology score for men equal to that of women?

Let's say we have data for 100 men and 100 women from New York State. We measure their political ideology on a scale where 0=far right and 100=far left. The mean score for men is 47.5 and for women 58.2. The standard deviations are 14.9 for men and 9.6 for women.

The null hypothesis, you will recall, is that there is no effect. In this case, it means that there is no difference between the mean ideology score of men and women. The research hypothesis, then, is that the mean ideology score for men is not equal to the mean for women.

We can state this symbolically:

We can set our alpha to the usual level, 0.05.

We calculate the sample t-score with the following formula:

Degrees of freedom are defined as:

For our example, this would be:

This t-score tells us that this sample difference is more than 6 standard errors above the hypothesized population difference. How likely is it that we could get a sample difference of -10.7, if the true difference in the population is zero?

According to the t-score table (two-tailed, since we are not predicting the direction of the difference), the critical t-score is 1.98

Since the sample t-score is more extreme than the critical t-score, we reject the null hypothesis. We conclude that we are 95% certain that the difference in mean ideology scores for men and women in the population is not zero. Our sample data suggests that in New York State, women are significantly more liberal than men.

All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials.