The Statistical Imagination: Nominal Variables: The Chi-Square and Binomial Distributions

The Statistical Imagination
Chapter 13:
Nominal Variables: The
Chi-Square and Binomial
Distributions
2008 McGraw-Hill
The Chi-Square Test

Chi-Square is a test for a relationship
between two nominal variables
Calculations are made using a crosstabulation (or crosstab) table, which
reports frequencies of joint
occurrences of attributes
2008 McGraw-Hill
Crosstab Tables
Cross-tabulation or crosstab tables
are designed to compare the
frequencies of two nominal/ordinal
variables at once
2008 McGraw-Hill
Sample Crosstab Table

Spent night on streets in last 2 weeks
by gender among homeless persons
On streets
Yes
No
Total
Male
28
79
107
Female
Total
10 38
44 123
54 161
2008 McGraw-Hill
Reading a Crosstab Table

The number in a cell is the frequency of
joint occurrences, where a joint occurrence
is the combination of categories of the two
variables for a single individual
From the cell, look up then look to the left
E.g., in the table above, the joint occurrence
of male and on-street is 28, the number in
the sample who are both male and spent a
night on the streets
2008 McGraw-Hill
Reading a Crosstab Table (cont.)
The numbers in the margins on the

right side and the bottom present
marginal totals, the total number of
subjects in a category
The grand total (n, the sample size) is
presented in the bottom right-hand
corner
2008 McGraw-Hill
Crosstab Tables and

the Chi-Square Test
For the chi-square test, the categories
of the independent variable (X) go in
the columns of the table, and those of
the dependent variable (Y), in the rows
E.g.: Is gender a good predictor of who
among homeless persons is likely to
spend a night on the streets?
2008 McGraw-Hill
Calculating Expected
Frequencies
In addition to the observed joint
frequencies, the chi-square test involves
calculating the expected frequency of each
table cell
The expected frequency of a cell is equal
to the column marginal total for the cell
(look down) times the row marginal total for
cell (look to the right) divided by the grand
total
2008 McGraw-Hill
Using Expected Frequencies

to Test the Hypothesis
The expected frequencies are those that
would occur if there is no relationship
between the two nominal/ordinal variables
The chi-square statistic measures the gap
between expected and observed
frequencies
If there is no relationship, then the expected
and observed frequencies are the same
and chi-square computes to zero
2008 McGraw-Hill
The Chi-Square Statistic

The sampling distribution is generated using the
chi-square equation:
2 = [(O-E)2/ E]
where O is the observed frequency of a cell,
and E is the expected frequency
Chi-square tells us whether the summed squared
differences between the observed and expected
cell frequencies are so great that they are not
simply the result of sampling error
2008 McGraw-Hill
When to Use the

Chi-Square Statistic
1) There is one population with a
representative sample from it
2) There are two variables, both of a
nominal/ordinal level of
measurement
3) The expected frequency of each cell
in the crosstab table is at least five
2008 McGraw-Hill
Features of the Chi-Square

Hypothesis Test
Step 1. The H0 states that there is no
relationship between the two
variables. When this is the case, chisquare calculates to a value of zero,
give or take some sampling error
This null hypothesis asserts no
difference in observed and expected
frequencies
2008 McGraw-Hill

Hypothesis Test (cont.)
Step 2. The sampling distribution is the chisquare distribution. It describes all
possible outcomes of the chi-square
statistic with repeated sampling when there
is no relationship between X and Y
Degrees of freedom are determined by the
number of columns and rows in the
crosstab table: df = (r -1) (c -1)
2008 McGraw-Hill

Hypothesis Test (cont.)
Step 4. The test effects are the differences
between expected and observed
frequencies
The test statistic is the chi-square statistic
The p-value is obtained by comparing the
calculated chi-square value to the critical
values of the chi-square distribution in
Statistical Table G of Appendix B
2008 McGraw-Hill
The Existence of a Relationship

for the Chi-Square Test
Existence: Test the H0 that 2 = 0;
that is, there is no relationship
between X and Y
If the H0 is rejected, a relationship
exists
2008 McGraw-Hill
Direction and Strength of a

Relationship for Chi-Square
Direction: Not applicable (because
the variables are nominal level)
Strength: These measures exist but
are seldom reported because they
are prone to misinterpretation
2008 McGraw-Hill
Nature of a Relationship for

the Chi-Square Test
Nature: Report the differences
between the observed and expected
cell frequencies for a couple of
outstanding cells
Calculate column percentages for
selected cells
2008 McGraw-Hill
Column and Row

Percentages
A column percentage is a cells
frequency as a percentage of the
column marginal total
A row percentage is a cells frequency
as a percentage of the row marginal
total
2008 McGraw-Hill
Chi-Square as a Difference
of Proportions Test
The chi-square test is frequently used to
compare proportions of categories of a
nominal/ordinal variable for two or more
groups of a second nominal/ordinal
variable
Thus, it may be viewed as a difference
of proportions test as illustrated in
Figure 13-2 in the text
2008 McGraw-Hill
The Binomial Distribution

The binomial distribution test is a small
single-sample proportions test. Contrast it
to the large single-sample proportions test
of Chapter 10
The test hinges on mathematically
expanding the binomial distribution
equation, (P + Q)n
2008 McGraw-Hill
When to Use the

Binomial Distribution
1) There is only one nominal variable and it is
dichotomous, with P = p [of success] and Q
= p [of failure]
2) There is a single, representative sample
from one population
3) Sample size is such that [(psmaller)(n)] < 5,
where psmaller = the smaller of Pu and Qu
4) There is a target value of the variable to
which we may compare the sample
proportion
2008 McGraw-Hill
Expansion of the Binomial

Distribution Equation
Expansion of the binomial distribution
equation, (P + Q)n, provides the sampling
distribution for dichotomous events. That
is, the equation describes all possible
sampling outcomes and the probability of
each, where there are only two possible
categories of a nominal variable
2008 McGraw-Hill
An Example of an
Expanded Binomial Equation
The equation reveals, for example, the
possible outcomes of the tossing of 4 coins
P = p [heads] = .5; Q = p [tails] = .5; n = 4
coins
(P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4
Add the coefficients to get the total number of
possible outcomes = 16
The probability of 3 heads and 1 tails, is the
coefficient of P3Q1 over the sum of coefficients
= 4 over 16 = .25
2008 McGraw-Hill
Pascals Triangle
Pascals Triangle provides a shortcut
method for expanding the binomial
equation
It provides the coefficients for small
samples and allows a quick computation of
the probabilities of all possible outcomes
when P and Q are equal to .5
See Table 13-7 in the text
2008 McGraw-Hill
Features of the
Binomial Distribution Test
Step 1. H0: Pu = a target value
Step 2. The sampling distribution is
an expanded binomial equation for
the given sample size
2008 McGraw-Hill
Features of the Binomial

Distribution Test (cont.)
Step 4. The effect is the observed
combination of successes and failures,
which corresponds to a term in the equation
(e.g., 3 heads and 1 tails, is represented by
the term 4P3Q1)
The test statistic is the expanded binomial
equation
The p-value is taken directly from the
equation (not from a statistical table)
2008 McGraw-Hill
Statistical Follies:
Statistical Power and Sample Size
For a given level of significance,
statistical power is a test statistics
probability of not incurring a Type II
error (i.e., unknowingly making the
incorrect decision of failing to reject a
false null hypothesis)
Low statistical power can result from
having too small a sample size
2008 McGraw-Hill

The Statistical Imagination: Nominal Variables: The Chi-Square and Binomial Distributions

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

The Statistical Imagination: Nominal Variables: The Chi-Square and Binomial Distributions

Transféré par

Droits d'auteur :

Formats disponibles

The Statistical Imagination

The Chi-Square Test

Sample Crosstab Table

Reading a Crosstab Table

Reading a Crosstab Table (cont.)

The numbers in the margins on the

Crosstab Tables and

Using Expected Frequencies

The Chi-Square Statistic

When to Use the

Features of the Chi-Square

Features of the Chi-Square

Features of the Chi-Square

The Existence of a Relationship

Direction and Strength of a

Nature of a Relationship for

Column and Row

The Binomial Distribution

When to Use the

Expansion of the Binomial

Features of the Binomial

Vous aimerez peut-être aussi