The social sciences, such as sociology, political science, or economics, investigate many different kinds of social phenomena. In order to do so, researchers need many different kinds of methods. Some methods are better suited to particular kinds of investigations, others are better for different kinds of studies. All research methods, though, share certain characteristics, in order to be regarded as scientific.
The social sciences are empirical in orientation; this means that we regard empirical data as the test of our explanations about the phenomena in which we are interested. Empirical data are collected through the scientific method.
When we seek to evaluate our theories with empirical research, we generally follow the steps of hypothesis testing. The first step is to operationalize the concepts in our theory. Operationalization involves translating a concept into a measurable variable.
For example, let’s say our theory suggests that education is positively related to political participation. In order to test this proposition, we have to operationalize the concepts ‘education’ and ‘political participation.’ Education might seem straightforward; we could measure years of education. Or, we could measure highest degree attained. Both are measures of educational achievement, but neither is exactly equivalent to the concept ‘education.’ The concept of ‘political participation’ is more challenging. We could define it as whether or not someone voted in the last presidential election. What other ways could we operationalize political participation?
When we operationalize, we are trying to devise a measure that captures the fullness and complexity of the concept. Often, this requires that we use several items to measure a single concept. We would combine the answers to these items into an index variable.
Once we have operationalized our concepts, we can rewrite our proposition as a hypothesis. In our example, this might be: Highest degree attained will be positively related to whether or not a person voted in the last presidential election. The higher the degree, the more likely someone will be to have voted.
We would collect empirical data on a relevant sample and then select the appropriate statistical tool to test our hypothesis. In this case, we might generate a crosstabulation showing the percentage of those with various degrees who voted in the last election.
For example, we might find:

Reading the column percents in the table, we can see that there is a relationship between highest degree attained and whether or not someone voted in the last presidential election. The higher the degree attained, the more likely someone is to have voted in the last election. This relationship is a probabilistic one—it is evidence of a tendency for these two variables to be related for adult Americans. It describes a social pattern. It does not, it must be pointed out, describe the relationship between education and political participation for any individual.
The chi-square statistic is the test of statistical significance for the table. It tells us that the probability that the table is due to chance, and not a real relationship between education and political participation, is less than 5%. We’ve only collected data from 1,108 people, but we want to generalize to the population of adult Americans. The significance test gives us an indication of the confidence we can have in our data. If the chance of error is less than 5%, by convention, we feel we are able to conclude that the evidence supports our hypothesis.
Good hypotheses have to be falsifiable. That is, it has to be clear just what empirical data would support the hypothesis, and what data would not. Because we usually want to support our hypotheses, we must be careful not to state our hypotheses in such a way so that any empirical result could be interpreted as support. In this example, if we had found that the percentages of voters was more or less the same for the three education levels, we would be forced to conclude that the data did not support the hypothesis. This hypothesis, then, passes the test of falsifiability.
To evaluate a hypothesis using a crosstabulation, you need two basic quantitative skills: an understanding of percentages, and, a facility for decimal numbers. The exercises in this lesson will lead you, step-by-step, through the hypothesis test with percentage tables.
Let's consider the following research question: Are men more politically active than women? Our first task is to operationalize these concepts. Gender is straightforward; we can ask people if they are male or female. Political participation is more complicated. If we want to adequately measure it, we would need more than one question. (This is the case whenever our concepts are multidimensional.) For now, though, we'll use one common question, "Did you vote in the last presidential election?"
With our variables determined, we can formulate a hypothesis. A researcher usually develops his or her hypotheses out of the academic literature on a subject or from a particular theoretical point of view. In this case, though, we'll use the common stereotype that men are more active in public life. Our hypothesis, then, is: Men will be more likely to say they voted in the last presidential election.
Now, we are ready to test this hypothesis with some empirical data. Since we have easy access to the General Social Survey, from the National Opinion Research Center, at the University of Chicago, we'll use their most recent data. Our statistics program, SPSS, does the calculations. We get these results:

Note: data come from the 1996
GSS.
Our first task is to determine if the results are reliable. The chi-square test is used for this purpose. The probability figure associated with chi-square, indicated by "p" in the note below the table, tells us how likely it is that we could get this result simply due to chance. If this figure is less than 0.05, then we are confident that the result is not due to random error, but in fact reflects the state of the social world we are studying.
The logic of hypothesis testing always evaluates an empirical result against the null result, which is defined as the case if there is no relationship between the variables. We consider the probability, indicated by "p", that we could get our result if the null result were actually true. The table we get from the data might be erroneous, reflecting meaningless variation due to sampling, since we have not asked all adult Americans about their gender and voting, but instead, only a random sample of about 2,500.
We use the figure 0.05 by convention. Social scientists are unusually cautious when it comes to hypothesis testing. We want to be relatively certain that if we claim to have found a relationship, it actually exists. This is often called the 95% confidence level (0.95 + 0.05 = 1.00, or 100%). If our result, "p", is less than or equal to 0.05, then we know that the chance that the result is due to random error is less than five percent, and we are, therefore, willing to accept that it reflects the real state of affairs.
In this case, p is less than 0.239. Is this figure less than or equal to 0.05? Here's where you need to know how to evaluate decimal numbers. Think of it this way: 0.05 = 0.050, or 50 out of 1,000. Our result 0.239, is 239 out of 1,000. Since 239 is larger than 50, our result is greater than 0.05.
Our decision with regard to the hypothesis, then, is that this result is not reliable. Therefore, the result fails to support our hypothesis. This does not mean that we have proven our hypothesis is incorrect. The logic of hypothesis testing is always conditional. Our results support or fail to support our hypothesis. Our results never prove anything.
Because the results are unreliable, we must not interpret the pattern in the table. There is no point, after all, in trying to explain a result that we've admitted may be due to random error. We can't say definitively that men and women are equally likely to say they voted in '92. All we know is that the result does not support the hypothesis that men are more likely to say they voted.
That's it. We've completed the hypothesis test in this case. Since our data failed to support our hypothesis, we must turn our attention back to the hypothesis itself. When our results fail to support our hypotheses, there are two alternative courses of action. 1) Our hypothesis might, in fact, be incorrect. We would need to return to the literature or theory that led us to this hypothesis and try to see if we've made an error in interpretation or reasoning. We would ask ourselves, "Is there another hypothesis that would make more sense?" 2) The data we've used to test our hypothesis might be inappropriate. The variables we've selected to measure our concepts might not be effective. We may need to select other variables and retest the hypothesis.
Since we generated our hypothesis on the basis of a stereotype, it should not be too surprising that the result failed to support it. This is, after all, the nature of stereotypes; they are usually founded on misleading assumptions about the world.
Let's consider another example.
Assume that we are interested in social class and political participation. Once again, we'll start with the hypothesis. From the literature, we learn that several studies have found a relationship between class and participation, suggesting that the higher a person's class position, the more likely their participation.
We need to operationalize these concepts. For social class, we have several options. Social scientists usually create an index variable to measure social class, combining occupation, education and income. For the sake of simplicity, though, we'll just use occupation. In the GSS, we have a variable that measures occupational prestige. (The higher the prestige, the higher the social class.) We'll use the same participation variable, vote in the '92 elections. Our hypothesis, then, is: Those with higher occupational prestige will be more likely to say they voted in the last presidential election.
SPSS calculates the following table:

Note: data come from the 1996
GSS.
First, we need to determine if the result is reliable. The probability value in this case is 0.000. Comparing this to 0.05, we conclude that the table is reliable. (The probability value of the table, 0.000, is less than the conventional criterion, 0.05. The table value is 0 out of 1,000, and the criterion is 50 out of 1,000. Since 0 is less than 50, we can conclude that the table is reliable.)
At this point, we can try to interpret the pattern in the table, because we are confident that it is not due to chance. The values in the table are column percents. This means that they represent the percentage of people with each occupational prestige category who answered yes or no to the vote question. For example, in the first column, top row, we have 61.2%. This indicates that more than sixty percent of those with low occupational prestige said they voted in the '92 election. We also see that 70.2% of those with middle occupational prestige said they voted, and 82% of those with high occupational prestige said they voted.
Our hypothesis predicted that those with higher occupational prestige will be more likely to say they voted, and indeed, this is what the result indicates. The data, therefore, support our hypothesis.
So far, so good. Now, we need to fashion a social scientific explanation of the relationship suggested by the crosstabulation. In other words, we need to try to answer the why question: why do those with higher occupational prestige participate more? We try to come up with reasonable connections between social class and participation, that would allow us to generate new hypotheses. For example, we might speculate that those from higher class positions are more likely to feel like candidates represent their interests. Our political culture tends to play toward the middle, and politicians are constantly talking about the middle class. This seems a reasonable conjecture. We could generate a new hypothesis in order to investigate it. (What would that hypothesis be?)
When we try to explain the results, we are speculating. This does not mean that we are simply guessing, or making up stories to fit the data. On the contrary, our explanations draw upon the same sources as our hypotheses: the literature and theory.
Hypothesis testing is an ongoing process. In order to fully study a phenomenon, we need to test a series of hypotheses about it. To be confident in a particular explanation, we want to demonstrate that it fits the data better than other explanations. We call this process specification.
All content on this site is copyright © 2003-2005 by Prof. Timothy Shortell, except where copyright is retained by the original owners. No infringement of rights is meant or implied. This page is U.S. Section 508 accessible.