Académique Documents
Professionnel Documents
Culture Documents
Management
Day-2
Recap..
Introduction
Definition
Terms and terminologies
Types of statistics
Types of data
Levels of measurements
Application of statistics in business
Data warehousing & data mining
Sources of data
Organizing /Classification of data
Qualitative
Quantitative
Geographical
Chronological
Time series (is a set of observations collected at
usually discrete and equally spaced time intervals- Eg.
Daily closing stock price of a certain stock recorded
over the last six weeks )
Cross sectional (observations from different
individuals or groups at a single point in time
inventory of all ice creams in stock at a particular
store)
VISUALIZING/PRESENTATION OF
DATA
TABULAR
DIAGRAMS
GRAPHS
TABULATION
SPECIMEN OF A TABLE
Total Grand
Total
Foot Note
Sources
Descriptive Statistics:
Tabular and Graphical Presentations
Summarizing Categorical Data
Summarizing Quantitative Data
Categorical Data
Chap 2-8
Summarizing Categorical Data
Frequency Distribution/ contingency table
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Bar Chart
A bar chart is a graphical device for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that each
class is a separate category.
Bar Chart
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Pareto Diagram
Stem-and-Leaf
Display Histogram Polygon Ogive
Chap 2-23
Example: Marada Inn
DISCRETE SERIES
EXCLUSIVE
BIVARIATE DATA
MORE THAN
Frequency Distribution
It is a tabular summary of data showing the
number of items in each of the non
overlapping classes
A table that organises data into classes or
groups of values
They divide a range into equal classes
Width of CI =
30 58 37 50 30
53 40 30 47 49
Ages of a Sample of
Managers from
50 40 32 31 40 Urban Child Care
52 28 23 35 25 Centers in the
United States
30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74 37 29 43 54
Frequency Distribution of Child
Care Managers Ages
53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52 Smallest
49 33 43 46 32
61 31 30 40 60 Largest
74 37 29 43 54
Number of Classes and Class Width
The number of classes should be between 5 and 15.
Fewer than 5 classes cause excessive summarization.
More than 15 classes leave too much detail.
Class Width
Divide the range by the number of classes for an
approximate class width
Round up to a convenient number
51
Approximat e Class Width = = 8.5
6
Class Width = 10
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
LESS THAN CUMULATIVE FREQUENCY SERIES
NO. OF
HOURS
WORKERS
LESS THAN 10 5
LESS THAN 30 15
LESS THAN 60 30
LESS THAN 90 50
MORE THAN CUMULATIVE FREQUENCY SERIES
10 19 17
20 29 15
30 39 12
40 49 10
EXCLUSIVE CLASS INTERVAL
NO. OF
REVENUE (RS.)
PRODUCTS
100 200 15
200 300 20
300 400 10
400 500 5
TOTAL 50
OPEN END CLASS INTERVAL
Relative Cumulative
Class Interval Frequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative Frequencies
Cumulative
Relative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Frequency Distribution
Example BMW manufactures racing cars
and has gathered the following info on the
number of models of engines in different
size categories used in the racing market it
serves.
Engine Size # of Engine Size # of
cu inches models cu inches models
101 150 1 301 350 17
151 200 7 351 400 16
201 250 7 401 450 15
251 300 8 451 500 7
Frequency Distribution
- Construct a cumulative relative frequency distribution.
1.5
1.2
0.9
0.6
0.3
0.0
1Q 2Q 3Q 4Q 1Q
2003 C4 2004
Pie Chart Calculations for Company A
2d Quarter
Truck
Production
Company Proportion Degrees
E 12,747 .014 5
Totals 920,190 1.000 360
PIE DIAGRAM
Complaints by Amtrak Passengers
COMPLAINT NUMBER PROPORTION DEGREES
2-51
Complaints by Amtrak Passengers
Schedules,
Personnel Etc.
14% 10%
Equipment
15%
Stations, Etc.
40%
Train
Performance
21%
Histogram
A histogram is a chart made of bars of
different heights.
Widths and locations of bars
correspond to widths and locations of
data groupings
Heights of bars correspond to
frequencies or relative frequencies of
data groupings
Frequency Histogram
Relative Frequency Histogram
Histogram
20
20-under 30 6
30-under 40 18
Frequency
40-under 50 11
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
Histogram Construction
20
20-under 30 6
30-under 40 18
Frequency
40-under 50 11
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
Frequency Polygon
20
20-under 30 6
30-under 40 18
Frequency
40-under 50 11
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
Ogive
Cumulative
60
Class Interval Frequency
20-under 30 6
40
Frequency
30-under 40 24
40-under 50 35
20
50-under 60 46
60-under 70 49
0
0 10 20 30 40 50 60 70 80
70-under 80 50
Years
Relative Frequency Ogive
Cumulative
Relative
Chap 2-61
Organizing Numerical Data:
Stem and Leaf Display
A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves)
branch out to the right on each row.
Age of College Students
Chap 2-62
Safety Examination Scores
for Plant Trainees
Raw Data Stem Leaf
86 77 91 60 55 2 3
76 92 47 88 67 3 9
4 79
23 59 72 75 83
5 569
77 68 82 97 89
6 07788
81 75 74 39 67 7 0245567789
79 83 70 78 91 8 11233689
9 11247
68 49 56 94 81
Organizing Categorical Data:
Pareto Chart
Used to portray categorical data (nominal
scale)
A vertical bar chart, where categories are
shown in descending order of frequency
A cumulative polygon is shown in the same
graph
Used to separate the vital few from the
trivial many
Chap 2-64
Organizing Categorical Data:
Pareto Chart
100% 100%
% in each category
80% 80%
Cumulative %
(line graph)
(bar graph)
60% 60%
40% 40%
20% 20%
0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
Chap 2-
65
Pareto Chart
100 100%
90 90%
80 80%
70 70%
60 60%
Frequency
50 50%
40 40%
30 30%
20 20%
10 10%
0 0%
Poor Short in Defective Other
Wiring Coil Plug
Scatter Plot
Gasoline Sales
5 60
100
15 120
9 90
0
15 140 0 5 10 15
Registered Vehicles
20
7 60
Time Plot
M o n th ly S te e l P r o d u c tio n
8 .5
7 .5
M ill io n s o f T o n s
6 .5
5 .5
M o n th J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O
Cross Tabulations
Used to study patterns that may exist between
two or more categorical variables.
Chap 2-69
Cross Tabulations:
The Contingency Table
The cell is the intersection of the row and column and the
value in the cell represents the data corresponding to that
specific pairing of row and column categories.
Chap 2-70
Cross Tabulations:
The Contingency Table
Chap 2-
71
Scatter Plots
Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables
Chap 2-72
Scatter Plot Example
29 146
150
33 160
100
38 167
50
42 170
0
50 188
20 30 40 50 60 70
55 195
Volume per Day
60 200
Chap 2-73
Time Series Plot
Chap 2-74
Time Series Plot Example
Number of
Year Franchises Number of Franchises, 1996-2004
120
1996 43
100
1997 54 Franchises
Number of
80
1998 60 60
1999 73 40
2000 82 20
0
2001 95
1994 1996 1998 2000 2002 2004 2006
2002 107 Year
2003 99
2004 95
Chap 2-75
Principles of Excellent Graphs
Chap 2-76
Graphical Errors: Chart Junk
Bad Presentation
Good Presentation
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
Chap 2-77
Graphical Errors:
No Relative Basis
200 20%
100 10%
0 0%
FR SO JR SR FR SO JR SR
Chap 2-78
Graphical Errors:
Compressing the Vertical Axis
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Chap 2-79
Graphical Errors: No Zero Point on the
Vertical Axis
Bad Presentation
Good Presentations
Chap 2-80
Chapter Summary
In this chapter, we have
Chap 2-81
Cross Tabulation
Understanding relationship between 2 variables
Example Quality rating of meals of various prices at 10
restaurants
Price
Rating 10 - 19 20 - 29 30 - 39 Total
Good 1 2 1 4
25% 50% 25% 100%
Very Good 2 2 0 4
50% 50% 100%
Excellent 2 0 2 4
50% 50% 100%
Total 5 4 3 12
Cross Tabulation
Problem - In a study of job satisfaction for 4
occupations higher the scores indicate high satisfaction
Provide a cross tab of occupation & satisfaction score
Lawyer 44 Comp Analyst 54 Lawyer 53
CLASS
05 5 10 10 15 15 20
INTERVAL
0 10 1 - 2 -
10 20 4 3 - -
20 30 - - 1 -
30 40 2 - 1 -
Tabular and Graphical Methods
Data