Académique Documents
Professionnel Documents
Culture Documents
n
: x is referred to as x-bar
Attributes of the Arithmetic Mean
It is straight-forward to calculate
It is easy to interpret the mean
It gives us a good estimate of where a set of
numbers is centred
This is referred to as the central tendency of
a sample
It is sensitive to outliers
Other Measures of Central Tendency
Median: The middle value of an ordered set of
values, i.e. 50% higher and 50% lower
Mode: The most commonly occurring value in a
distribution
Calculating the Median
Year Medals
1964 90
1968 107
1972 94
1976 94
1980 0
1984 174
1988 94
1992 108
1996 101
2000 92
2004 103
2008 110
Medals (Sorted)
174
110
108
107
103
101
94
94
94
92
90
0
Sort the
data
Median = 97.5
Calculating the Mode
Medals Count
174 1
110 1
108 1
107 1
103 1
101 1
94 3
92 1
0 1
Mode = 94
Year Medals
1964 90
1968 107
1972 94
1976 94
1980 0
1984 174
1988 94
1992 108
1996 101
2000 92
2004 103
2008 110
Count
frequencies
When to Use Each Central Tendency
Value?
Question: When and why would you use the median
over the mean?
Lets Look at the Variation in our Data
0
2
4
6
8
10
12
14
16
18
20
C
o
u
n
t
Distribution of the Total Olympic Medals won by any Country from 1964 -
2008
0
2
4
6
8
10
12
14
16
18
20
C
o
u
n
t
Distribution of the Total Olympic Medals won by any Country from 1964 -
2008
Lets Look at the Variation in our Data
Central Tendency / Location
Spread/Variation
Measures of Spread or Variation
Range
Variance
Standard Deviation
Inter-quartile Range
Calculating the Range
The Range in calculated by subtracting the
minimum value in a data set from the maximum
value
The main advantage to using the range is the
ease with which it is calculated
The major disadvantage of the range is that it
is highly sensitive to outliers
Calculating the Variance
As an example of Variance consider the following
data:
OBS Data
1 3
2 4
3 8
Sum 15
Mean 5
Calculating the Variance
As an example of Variance consider the following
data:
OBS Data Mean Deviation
1 3
5
-2
2 4
5
-1
3 8
5
3
Sum 15 15 0
Mean 5 15 0
Calculating the Variance
As an example of Variance consider the following
data:
OBS Data Mean Deviation (Deviation)
2
1 3
5
-2 4
2 4
5
-1 1
3 8
5
3 9
Sum 15 15 0 14
Mean 5 15 0 4.67
Variance The Formula
Square the deviations around the mean before
summing. For n data points x
1,
x
2
..... x
n
:
Divide by n-1 (?) to get the average of squared
deviations:
x
i
x
n
2
i1
n
s
2
x
i
x
n
2
i1
n
n 1
Standard Deviation The Formula
Take the square root of the variance. The value
is in the original unit
s
x
i
x
n
2
i1
n
n 1
Standard Deviation
Question: Why might it be useful to have the
value is in the original unit?
Percentiles
The nth percentile is a value that has a proportion
of the sample taking values at or lower than it,
and taking values larger than it
Example: if your grade in an industrial engineering
class was located at the 84th percentile, then 84%
of the grades were equal to or lower than your
grade and 16% were higher
n
100
100 n
100
Inter-quartile Range
The median is the 50th percentile
The 25th percentile and the 75th percentile are
called the lower quartile and upper quartile
respectively (or 1
st
and 3
rd
)
The difference between the lower and upper
quartile is called the inter-quartile range
Quartiles Example
Medals (Sorted)
174
110
108
107
103
101
94
94
94
92
90
0
Sort the
data
25
th
Percentile =
1
st
Quartile = 93
50
th
Percentile =
Median = 97.5
75
th
Percentile =
3
rd
Quartile = 107.5
Inter-quartile Range
107.5 93 = 14.5
Year
Medals
1964 90
1968 107
1972 94
1976 94
1980 0
1984 174
1988 94
1992 108
1996 101
2000 92
2004 103
2008 110
Proportions
The proportion, p, of items in a population that belong
to a certain class, for example:
The proportion of your customers that are female
The proportion of voters that will vote for Labour in the
next election
A proportion is calculated as:
where C is the number of items in a population of size N
that belong to the class of interest
p
C
N
Skew The Shape of a Distribution
There are a number of ways of describing the
shape of a distribution.
We will consider only one skew.
Skew is a measure of how asymmetric a
distribution is.
Symmetric Distributions = skew is zero
There are few very large data points which create a
'tail' going to the right (i.e. up the number line)
Note: No axis of symmetry here - skew > 0 (i.e. it is positive)
Example: Lifetime of people, house prices
Positive Skew
There are few very small data points which create
a 'tail' going to the left (i.e. down the number line)
Note: No axis of symmetry here - skew < 0 (i.e. it is negative)
Examples: Examination Scores, reaction times for drivers
Negative Skew
Mean, Median & Mode are
the same and are found in the
middle
6
6
5 6 7
4 5 6 7 8
3 4 5 6 7 8 9
Mean = 102/17 = 6
Median = 6
Mode = 6
Skew & Measures of Location - Symmetry
Mode
Median
Mean
6
6
5 6 7
5 6 7 8 9
5 6 7 8 9 10 11
Mean = 121/17 = 7.12
Median = 7
Mode = 6
In general: Mode < Median < Mean
Positive Skew
Mode
Median
Mean
Mean = 83/17 = 4.89
Median = 5
Mode = 6
In general: Mode > Median > Mean
6
6
5 6 7
3 4 5 6 7
1 2 3 4 5 6 7
Negative Skew
Section 3: Graphs and Visualisation
Graphical Displays
A way of letting people get a 'picture' of
relationships in the data set.
The simpler the better should be a rule in graphical
display.
People can remember pictures better.
A good graph should show something that is not
easy to see using tables.
Bar Charts
Used to display categorical data or discrete
data with a modest number of values.
A Bar is drawn to represent each category.
The Bar height represents the frequency or % in
each category.
Allows for visual comparison of relative
frequencies.
Need to draw up a frequency distribution table
first.
Core Statistical Plots
0
5
10
15
20
25
Points Scored by any Team in Six
Nations Championship 2000 -
2011
Core Statistical Plots
Comparisons Column Charts
Box Plots
Core Statistical Plots
Correlations
Scatter Plots
Trends
(time)
Line Charts
Core Statistical Plots
Proportions Pie Chart
Column Chart
Some Hans Inspiration to Finish UP
http://www.youtube.com/watch?v=fTznEIZRkLg