An Introduction to Data Analysis & Presentation
Prof. Timothy Shortell, Sociology, Brooklyn College
The Confidence Interval
We have just learned that we can estimate the likelihood of drawing a random sample with a particular mean from a population, when we know the population parameters. This is the z-test. With this, we take the first step in inferential statistics, decision-making about hypotheses based on probability.
The problem with the z-test is that we don't usually know the population parameters. After all, this is usually why we are undertaking the research in the first place! What we need, then, is a way to estimate the parameters.
We usually don't know the population standard deviation, sigma. As a result, we can't calculate the standard error using the formula with which we are familiar (at left).
When we collect data, we have information about the sample. Not knowing anything more, this is the best information we have about the population. We can use the sample standard deviation to estimate the population standard deviation, and therefore, the standard error.
We will use the following formula to estimate the standard error:
We can rewrite the z-test, using the estimated standard error:
With the t-ratio, we can generate an estimate of the population mean -- remember, we usually don't know this when we conduct a study. We refer to this as the confidence interval, and define it as:
(A tangent on degrees of freedom. When we use the t-ratio, and the other inferential statistics, we encounter the notion of degrees of freedom -- df. Degrees of freedom tells us which one of a family of sampling distributions to use for our test. In practice, though, it relates only to reading the test tables. We will use it, for example, to look for a critical value of t in the t-table.)
At last! Our first example of how useful this inferential stuff (probability, sampling distributions, standard scores) is.
We have a sample of 200 graduates from elite liberal arts colleges. We measure their job status, and find a mean of 81.4 and a standard deviation of 13.8. Estimate the 95% confidence interval for the mean job status score in the population.
First, we need to determine the critical value of t. For a sample size of 200, df = 199. In the Z-score table, we look in the fourth column (for alpha=0.05, or 95% confidence), and in the row for df = 120. This yields a value of 1.98.
Next, we estimate the standard error:
We express this as:
First, determine the critical value of t. With N=1000, df=999. For 95% CI, alpha=0.05. This yields a t of 1.96.
Plug these values into the formula and crunch the numbers. We estimate, at 95% confidence, that the mean congregation size in the U.S. is between 387.34 and 363.46 persons.
We have data for 250 county courthouses in the U.S. The mean sentence handed down in assault cases is 2.8 years, with a standard deviation of 6.2 years. Calculate the 99% confidence interval.
First, determine the critical value of t. With N=250, df=299. For 99% CI, alpha=0.01. This yields a t of 2.617.
Plug these values into the formula and crunch the numbers. We estimate, at 99% confidence, that the mean sentence length for assault cases in the U.S. is between 3.83 and 1.77 years.
All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials.