Académique Documents
Professionnel Documents
Culture Documents
Chapter 2
Learning Objectives
Learn how to describe a set of Nominal data. Learn how to describe the relationships between two nominal variables. variables
Subset
The graphical & tabular methods presented here apply to both entire populations and samples drawn from populations.
3
Definitions
Variable: some characteristic of a population or sample.
E.g. student grades: {67, 74, 71, 83, 93, 55, 48}
Variable: what you want to measure Values Example: Gas range: 2.90 5.00 Data: actual gas prices
Interval Data
Nominal Data
Ordinal Data
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 We can say things like: excellent > poor or fair < very good
Interval: nominal/quantitative > arithmetic (if you can apply, then interval) Ex: avg price, avg qty, age, distance traveled Nominal & Ordinal: both qualitative/categorical Nominal Ex: single x single <> married; cant say one is better than other; gender, race Ordinal Data: order matters (different from nominal) Grades
Ordinal
- Values must represent the ranked order of the data. - Calculations based on a ranking process are valid. - Data may be treated as nominal but not as interval.
Nominal
Values V l are th the arbitrary bit numbers b th that t represent t categories. t i - Only calculations based on the count/frequencies of occurrence are valid. - Data may not be treated as ordinal or interval.
Interval: nominal/quantitative > arithmetic (if you can apply, then interval) Ex: avg price, avg qty, age, distance traveled Nominal & Ordinal: both qualitative/categorical Ex: single x single <> married; cant say one is better than other; gender, race Ordinal Data: order matters (different from nominal) Grades
Your Turn
We can summarize the data in a table that presents the categories and their counts called a frequency distribution.
A relative frequency distribution lists the categories and the proportion with which each occurs.
Tabular description = Table Frequency Distribution: Looks at counts only (own versus rent) Relative Frequency Distribution: Proportion / percentage (% of rent vs % buying)
Last week were you working full time, part time, going to school school, keeping house house, or what? what ? The responses were:
Working full time 2. Working part time Generally, variable has a short name Variable = Work Status 3. Temporarily not working Values: 1-8 4. Unemployed, laid off 5. Retired 6. School 7. Keeping house 8. Other The responses were recorded using the codes 1, 2, 3, 4, 5, 6, 7, and 8.
1.
9
10
11
Frequency Distribution: =countif(A1:A2023,1) = 1003 (BAR CHARTS) Relative frequency Distribution with Pivot tables (PIE CHARTS) 12
Nominal Data
Its all the same information, (based on the same data). Just different presentation.
Frequency Distribution: =countif(A1:A2023,1) = 1003 (BAR CHARTS) Relative frequency Distribution with Pivot tables (PIE CHARTS)
15
Your Turn
16
a major North American city there are four competing newspapers: the Post, Globe, Sun, and Star.
To help design advertising campaigns, the advertising managers of the newspapers need to know which segments of the newspaper market are reading their papers.
A survey was conducted to analyze the relationship between newspapers and occupation.
17
A sample of newspaper readers was asked to report which newspaper they read:
The readers were also asked to indicate whether they were a blue blue-collar collar worker (1) (1), white white-collar collar worker (2), or professional (3) How many possible combinations of these two variables are there?
18
As a first step we need to produce a crossclassification table, which lists the frequency of each combination of the values of the two variables.
Blue Collar White Collar 27 29 18 43 38 21 37 15 120 108 Professional 33 51 22 20 126 Total 89 112 81 72 354
By counting the number of times each of the 12 possible combinations occurs, we can produce the following cross-tabulation (cross-classification)
19
Relative Frequencies
If occupation and newspaper are related, then there will be notable differences in newspapers read by occupations. occupations
An easy way to see this is to covert the frequencies in each column to relative frequencies.
Blue Collar 27/120 =0.23 18/120 = 0.15 38/120 = 0.32 37/120 = 0.31 White Collar 29/108 = 0.27 43/108 = 0.40 21/108 = 0.19 15/108 = 0.14 Professional 33/126 = 0.26 51/126 = 0.40 22/126 = 0.17 20/126 = 0.16
20
10
Interpretation
The relative frequencies in columns 2 and 3 are similar, but there are large differences between columns 1 and 2 and between columns 1 and 3.
Newspaper Globe Post Star Sun Blue Collar 27/120 =0.23 18/120 = 0.15 38/120 = 0.32 37/120 = 0.31 White Collar 29/108 = 0.27 43/108 = 0.40 21/108 = 0.19 15/108 = 0.14 Professional 33/126 = 0.26
dissimilar
This tells us that blue collar workers tend to read different newspapers from both white collar workers and professionals and that white collar and professionals are quite similar in their newspaper choice.
21
Use the data from the cross-classification table to create bar charts
22
11
Interpretation
If the two variables are unrelated, the patterns exhibited in the bar charts should be approximately the same.
If some relationship exists, then some bar charts will differ from others.
The shapes of the bar charts for occupations 2 and 3 (Whitecollar and Professional) are very similar. B h diff Both differ considerably id bl f from the h b bar chart h f for occupation i 1 (Bl (Bluecollar).
23
Your Turn
2.44
24
12
Homework
Pages 41-42:
25
13