Vous êtes sur la page 1sur 27

The Statistical Imagination

Chapter 13:
Nominal Variables: The
Chi-Square and Binomial
Distributions
2008 McGraw-Hill

The Chi-Square Test


Chi-Square is a test for a relationship
between two nominal variables
Calculations are made using a crosstabulation (or crosstab) table, which
reports frequencies of joint
occurrences of attributes
2008 McGraw-Hill

Crosstab Tables
Cross-tabulation or crosstab tables
are designed to compare the
frequencies of two nominal/ordinal
variables at once

2008 McGraw-Hill

Sample Crosstab Table


Spent night on streets in last 2 weeks
by gender among homeless persons
On streets
Yes
No
Total

Male
28
79
107

Female
Total
10 38
44 123
54 161

2008 McGraw-Hill

Reading a Crosstab Table


The number in a cell is the frequency of
joint occurrences, where a joint occurrence
is the combination of categories of the two
variables for a single individual
From the cell, look up then look to the left
E.g., in the table above, the joint occurrence
of male and on-street is 28, the number in
the sample who are both male and spent a
night on the streets
2008 McGraw-Hill

Reading a Crosstab Table (cont.)

The numbers in the margins on the


right side and the bottom present
marginal totals, the total number of
subjects in a category
The grand total (n, the sample size) is
presented in the bottom right-hand
corner
2008 McGraw-Hill

Crosstab Tables and


the Chi-Square Test
For the chi-square test, the categories
of the independent variable (X) go in
the columns of the table, and those of
the dependent variable (Y), in the rows
E.g.: Is gender a good predictor of who
among homeless persons is likely to
spend a night on the streets?

2008 McGraw-Hill

Calculating Expected
Frequencies
In addition to the observed joint
frequencies, the chi-square test involves
calculating the expected frequency of each
table cell
The expected frequency of a cell is equal
to the column marginal total for the cell
(look down) times the row marginal total for
cell (look to the right) divided by the grand
total
2008 McGraw-Hill

Using Expected Frequencies


to Test the Hypothesis
The expected frequencies are those that
would occur if there is no relationship
between the two nominal/ordinal variables
The chi-square statistic measures the gap
between expected and observed
frequencies
If there is no relationship, then the expected
and observed frequencies are the same
and chi-square computes to zero
2008 McGraw-Hill

The Chi-Square Statistic


The sampling distribution is generated using the
chi-square equation:
2 = [(O-E)2/ E]
where O is the observed frequency of a cell,
and E is the expected frequency
Chi-square tells us whether the summed squared
differences between the observed and expected
cell frequencies are so great that they are not
simply the result of sampling error
2008 McGraw-Hill

When to Use the


Chi-Square Statistic
1) There is one population with a
representative sample from it
2) There are two variables, both of a
nominal/ordinal level of
measurement
3) The expected frequency of each cell
in the crosstab table is at least five
2008 McGraw-Hill

Features of the Chi-Square


Hypothesis Test
Step 1. The H0 states that there is no
relationship between the two
variables. When this is the case, chisquare calculates to a value of zero,
give or take some sampling error
This null hypothesis asserts no
difference in observed and expected
frequencies
2008 McGraw-Hill

Features of the Chi-Square


Hypothesis Test (cont.)
Step 2. The sampling distribution is the chisquare distribution. It describes all
possible outcomes of the chi-square
statistic with repeated sampling when there
is no relationship between X and Y
Degrees of freedom are determined by the
number of columns and rows in the
crosstab table: df = (r -1) (c -1)
2008 McGraw-Hill

Features of the Chi-Square


Hypothesis Test (cont.)
Step 4. The test effects are the differences
between expected and observed
frequencies
The test statistic is the chi-square statistic
The p-value is obtained by comparing the
calculated chi-square value to the critical
values of the chi-square distribution in
Statistical Table G of Appendix B
2008 McGraw-Hill

The Existence of a Relationship


for the Chi-Square Test
Existence: Test the H0 that 2 = 0;
that is, there is no relationship
between X and Y
If the H0 is rejected, a relationship
exists

2008 McGraw-Hill

Direction and Strength of a


Relationship for Chi-Square
Direction: Not applicable (because
the variables are nominal level)
Strength: These measures exist but
are seldom reported because they
are prone to misinterpretation

2008 McGraw-Hill

Nature of a Relationship for


the Chi-Square Test
Nature: Report the differences
between the observed and expected
cell frequencies for a couple of
outstanding cells
Calculate column percentages for
selected cells
2008 McGraw-Hill

Column and Row


Percentages
A column percentage is a cells
frequency as a percentage of the
column marginal total
A row percentage is a cells frequency
as a percentage of the row marginal
total

2008 McGraw-Hill

Chi-Square as a Difference
of Proportions Test
The chi-square test is frequently used to
compare proportions of categories of a
nominal/ordinal variable for two or more
groups of a second nominal/ordinal
variable
Thus, it may be viewed as a difference
of proportions test as illustrated in
Figure 13-2 in the text
2008 McGraw-Hill

The Binomial Distribution


The binomial distribution test is a small
single-sample proportions test. Contrast it
to the large single-sample proportions test
of Chapter 10
The test hinges on mathematically
expanding the binomial distribution
equation, (P + Q)n

2008 McGraw-Hill

When to Use the


Binomial Distribution
1) There is only one nominal variable and it is
dichotomous, with P = p [of success] and Q
= p [of failure]
2) There is a single, representative sample
from one population
3) Sample size is such that [(psmaller)(n)] < 5,
where psmaller = the smaller of Pu and Qu
4) There is a target value of the variable to
which we may compare the sample
proportion
2008 McGraw-Hill

Expansion of the Binomial


Distribution Equation
Expansion of the binomial distribution
equation, (P + Q)n, provides the sampling
distribution for dichotomous events. That
is, the equation describes all possible
sampling outcomes and the probability of
each, where there are only two possible
categories of a nominal variable
2008 McGraw-Hill

An Example of an
Expanded Binomial Equation
The equation reveals, for example, the
possible outcomes of the tossing of 4 coins
P = p [heads] = .5; Q = p [tails] = .5; n = 4
coins
(P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4
Add the coefficients to get the total number of
possible outcomes = 16
The probability of 3 heads and 1 tails, is the
coefficient of P3Q1 over the sum of coefficients
= 4 over 16 = .25
2008 McGraw-Hill

Pascals Triangle
Pascals Triangle provides a shortcut
method for expanding the binomial
equation
It provides the coefficients for small
samples and allows a quick computation of
the probabilities of all possible outcomes
when P and Q are equal to .5
See Table 13-7 in the text
2008 McGraw-Hill

Features of the
Binomial Distribution Test
Step 1. H0: Pu = a target value
Step 2. The sampling distribution is
an expanded binomial equation for
the given sample size

2008 McGraw-Hill

Features of the Binomial


Distribution Test (cont.)
Step 4. The effect is the observed
combination of successes and failures,
which corresponds to a term in the equation
(e.g., 3 heads and 1 tails, is represented by
the term 4P3Q1)
The test statistic is the expanded binomial
equation
The p-value is taken directly from the
equation (not from a statistical table)
2008 McGraw-Hill

Statistical Follies:
Statistical Power and Sample Size
For a given level of significance,
statistical power is a test statistics
probability of not incurring a Type II
error (i.e., unknowingly making the
incorrect decision of failing to reject a
false null hypothesis)
Low statistical power can result from
having too small a sample size
2008 McGraw-Hill

Vous aimerez peut-être aussi