An Introduction to Data Analysis & Presentation

Prof. Timothy Shortell, Sociology, Brooklyn College


Up to now, we have been concerned with describing data. We are going to begin studying the process of statistical decision making, or hypothesis testing.

Decision making is based on probability. When we engage in hypothesis testing, we are balancing the possibility of making the correct decision with the possibility of making an incorrect one. We need to understand some of the principles of probability in order to assess when we have the right balance.

We can define probability as the relative likelihood of occurrence of a given outcome. As the formula indicates, this can be expressed as a ratio of the frequency of an event, E, to the total frequency of all events, E + !E (the exclamation point, in probability means "not," so E + !E is the total of all possible events).

Suppose that there are 6 candidates in the Republican primary for President. Two are women. One is an African American man.
What is the probability that the nomination will go to a woman? To an African American? To someone other than a white man?

Consider this: Suppose that there are 20 soloists in the Brooklyn College Symphony Orchestra. 12 play a string instrument, 5 play a horn, 2 a wind instrument, and one is a purcussionist. What is the probability that the next soloist to walk onstage will perform a Beethoven violin concerto?
Will perform a Bach composition for trumpet?
Will perform a Schubert composition for clarinet?

The Addition (Or) Rule.
Often, we want to know the probability of a class of events. In other words, what is the chance of this or that happening, where this and that have something in common.

Suppose that there are 15 faculty at Brooklyn College in Sociology. There are 21 faculty in Political Science and 28 in History, but only 6 in Anthropology. There are 300 faculty in other departments. Assuming that these four departments make up the Social Science Division at the college, what is the probability that the next Provost will come from the Social Sciences? What we are asking here is: what is the probability that the next Provost will come from Sociology or Political Science or History or Anthropology. (This is why the addition rule is called the "Or" rule.) To find the answer, we calculate each individual probability and add them together. What would be the probability that the new Provost would come from Sociology or Political Science?

Another example: Suppose that 200 researchers have applied for a grant from the National Science Foundation. 134 applicants come from public universities. 46 come from public liberal arts colleges. 15 come from private universities. The remaining 5 come from private research institutes. What is the probability that the grant will go to someone at a public school? At a university? At any institution of higher education?

The Multiplication (And) Rule.
Sometimes we need to know the probability of occurrence of a series of events. The question that we are asking here is: what is the probability that this event and that event will occur? In this calculation, we find the individual probabilities and multiply the results.

Suppose that your statistics instructor has 15 coats. 7 are black, 4 red, 3 blue and one green. What is the probability that the instructor will wear a black coat on Tuesday and a red one on Wednesday? A black coat both days? A blue coat and then a red one?

Here's another one: Suppose that downtown Urbanville has 16 avenues, 13 streets, 7 boulevards and 4 parkways. What is the probability that a traffic jam will occur on an avenue and a street? An avenue and a parkway? On all the parkways?

Errors of Reasoning. Probability can sometimes be confusing. We use many cognitive shortcuts, or heuristics, to function in our day to day lives, and this gives us an intuitive feel for probabilities — we understand the meaning of a weather report that says a 30% chance of rain. But our common sense about events and their causes is sometimes faulty. One common problem is the availability heuristic, that is, that we tend to overestimate the probabilities of events that are familiar and underestimate those that are unfamiliar.

Another common problem is the representativeness heuristic that leads us to overestimate probabilities of combinations that confirm our previously held beliefs. If you know a few people who like computers and are nerdy, you will likely overestimate the likelihood that the two attributes will occur together. The next nerdy person you meet will tend to make you think that he or she knows a lot about computers, because, you say to yourself, computer knowledge and nerdiness always go together. In reality, of course they don't always go together, even if they sometimes do.

One interesting instance of faulty logic about probabilities involves the conjunction of any two events that seem to go together. If you look closely at the rules for probability we've just discussed, you can see that the probability of A and B occuring is always smaller than the probability of A or B alone (as long as the events A and B have probabilities less than one, that is, are not certainties). Thus if the probability of A is 0.5 and the probability of B is 0.25, then, applying the multiplication rule, we know that the probability of A and B occuring is 0.125. which is less than either A or B.

Consider this example: Which of the two outcomes is more likely?

  1. Bob has had a heart attack;
  2. Bob is older than 55 years and has had a heart attack.

If you think about it as an application of the multiplication rule (A and B) then you realize that the conjunction has to be less likely. (Taken from Everyday Statistical Reasoning by Timothy J. Lawson.)

Let's put some numbers to the problem. (It doesn't matter that the probabilities are not empirically accurate.) Let's say that the probability of having had a heart attack is 0.10 and the probability of being over 55 years of age is 0.33. Then the probability of both having had a heart attack and being older than 55 is 0.03, which is less than either single probability alone.

Law of Large Numbers. One of the most important aspects of probability in the research context involves sampling. The Law of Large Numbers (LLN) tells us that larger random samples will be more representative of the population than smaller random samples. It has to do with the probability of drawing typical and atypical cases. In larger random samples, for the whole sample to be atypical, you would have to draw a significant number of atypical cases. Recall the multiplication rule: what is the probability of drawing this atypical case and that atypical case and that other atypical case, and so forth. Drawing a few atypical cases into a random sample may not be all that unlikely, but drawing a lot of a typical cases is, especially if the sample size is large. A few atypical cases might skew a small random sample, but not a large one. It is less likely to draw a random sample with lots of atypical cases than only a few.

All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials.