An Introduction to Data Analysis & Presentation
For example, we might want to know if there is a relationship between gender and vote in the 1980 presidential election, in a sample of U.S. elites. The crosstab illustrates this. The crosstab is a simple but very useful tool for examining causal relations among
For example, let's consider an example we've discussed before. If our research question is "does approval for the President depend on whether or not one lives in urban places?" then we can formulate the hypotheses: Using public opinion data (in this case, the ABC2010 dataset), we can calculate the following crosstabulation: | USR2 Q1r | 0 | 1 | Row Total | -------------|-----------|-----------|-----------| 0 | 399 | 105 | 504 | | 0.519 | 0.447 | | -------------|-----------|-----------|-----------| 1 | 370 | 130 | 500 | | 0.481 | 0.553 | | -------------|-----------|-----------|-----------| Column Total | 769 | 235 | 1004 | | 0.766 | 0.234 | | -------------|-----------|-----------|-----------|In this table, 0 indicates "not urban" for USR2 and "disapprove" for Q1r. The significance test is: Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 3.737326 d.f. = 1 p = 0.05320955 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------ Chi^2 = 3.454688 d.f. = 1 p = 0.06307263 Fisher's Exact Test for Count Data ------------------------------------------------------------ Sample estimate odds ratio: 1.334739 Alternative hypothesis: true odds ratio is not equal to 1 p = 0.06226612 95% confidence interval: 0.9851545 1.811241 Alternative hypothesis: true odds ratio is less than 1 p = 0.977726 95% confidence interval: 0 1.72726 Alternative hypothesis: true odds ratio is greater than 1 p = 0.03147629 95% confidence interval: 1.032496 InfWe'll use the Pearson's chi-squared test. In the case of a 2x2 table, we use the Yates correction. If the result is statistically significant, we can use the odds and odds ratio to discuss the strength of the linear relationship between urban residence and approval of the President. The R command to produce this table (assuming you loaded the ABC2010 dataset and attached it) is: Let's look at another example, this time from the CBS2011 data: | URBN[URBN == 1 | URBN == 3] Q1[URBN == 1 | URBN == 3] | 1 | 3 | Row Total | --------------------------|-----------|-----------|-----------| 1 | 43 | 140 | 183 | | 0.717 | 0.496 | | --------------------------|-----------|-----------|-----------| 2 | 17 | 142 | 159 | | 0.283 | 0.504 | | --------------------------|-----------|-----------|-----------| Column Total | 60 | 282 | 342 | | 0.175 | 0.825 | | --------------------------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 9.644134 d.f. = 1 p = 0.001899572 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------ Chi^2 = 8.779236 d.f. = 1 p = 0.003046787 Fisher's Exact Test for Count Data ------------------------------------------------------------ Sample estimate odds ratio: 2.558731 Alternative hypothesis: true odds ratio is not equal to 1 p = 0.002556915 95% confidence interval: 1.353408 5.025652 Alternative hypothesis: true odds ratio is less than 1 p = 0.9995208 95% confidence interval: 0 4.523515 Alternative hypothesis: true odds ratio is greater than 1 p = 0.001330111 95% confidence interval: 1.484226 InfRather than recode the URBN variable, which has more than 2 categories, I told R to select only cases where the value of URBN was 1 (large central city) or 3 (suburb). Q1 is again a measure of approval of the President, where 1 is "approve". To make sure there are the same number of cases for both variables, you need to use the selection code ( `[URBN==1 | URBN==3]` ) when identifying both variables in the `CrossTable()` function.
If you want to test a relationship for variables with more than two categories you can use the chi-squared test without the odds. (Odds are calculated in the | URBN[URBN != 4] Q1[URBN != 4] | 1 | 2 | 3 | 5 | Row Total | --------------|-----------|-----------|-----------|-----------|-----------| 1 | 43 | 71 | 140 | 73 | 327 | | 0.717 | 0.497 | 0.496 | 0.403 | | --------------|-----------|-----------|-----------|-----------|-----------| 2 | 17 | 72 | 142 | 108 | 339 | | 0.283 | 0.503 | 0.504 | 0.597 | | --------------|-----------|-----------|-----------|-----------|-----------| Column Total | 60 | 143 | 282 | 181 | 666 | | 0.090 | 0.215 | 0.423 | 0.272 | | --------------|-----------|-----------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 17.84538 d.f. = 3 p = 0.0004733554In this case, I told R to exclude cases where URBN is 4 (other) because there were no cases. All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials. |