Chapter 2 Part 2

Chapter 2
Summation Notation & Central Tendency

(Sections 2.3-2.5, 2.7 and 2.8)
mean
mode
median
C
h
a
p
t
e
r
Summation Notation (2.3)
Individual observations in a data set are denoted

x1, x2, x3, x4, xn
We often use a summation symbol

n
x
i 1
x1 x2 x3 ... xn
C
h
a
p
t
e
r
Summation Notation
To add all the values of variable x from the first

(x1) to the last (xn)
so if
x1 = 1, x2 = 2, x3 = 3 and x4 = 4,
then
4
x
i 1
1 2 3 4 10
C
h
a
p
t
e
r
Notation (continued)
Sometimes we will
have to square the
values before we
add them:
2
2
2
2
2
x
...
x
i 1 2 3
n
i 1
2
2
2
2
2
x
4
i
i 1
1 4 9 16 30
Other times we will n

2
add them and then xi x1 x x ... xn 2
i 1
square the sum:
2
n
2
xi 1 2 3 4 10 2 100
i 1
C
h
a
p
t
e
r
Numerical Measures of Center and

Spread
Central tendency
is the value or values around
which the data tend to
cluster
center
Variability
shows how strongly the data
cluster around value(s)
2
spread
5
C
h
a
p
t
e
r
Describing the Center of a Data Set

(2.4)
Mean
Median
Mode
center
2
6
C
h
a
p
t
e
r

Mean
The mean of a set of quantitative data is the sum of
the observed values divided by the number of values
The sample mean for a sample dataset x1 , x2 ,..., xn
is denoted by
x (x-bar), and is calculated by

n
x1 x2 ... xn
x
i 1
Note that the population mean is denoted by
(mu), and is calculated by
x x ... x N i 1
1 2
N
N
C
h
a
p
t
e
r
Example of Calculating a Mean

House Price in
Fancytown
231,000
313,000
299,000
x
i 1
2,950 ,000
10
295,000
312,000
285,000
317,000
294,000
297,000
10
315,000
287,000
=2,950,000
the average or mean

price for this sample of
10 houses in Fancytown
is $295,000
C
h
a
p
t
e
r
Example of Calculating a Mean

House Price in
Lowtown
10
97,000
93,000
110,000
i 1
2,950 ,000
10
295,000
121,000
113,000
the average or mean

price for this sample of
10 houses in Lowtown is
also $295,000
95,000
100,000
122,000
99,000
2,000,000
=2,950,000
outlier
C
h
a
p
t
e
r
Comparing the two examples
The mean for both Fancytown and Lowtown is

$295,000
This accurately represents the center of the
data for Fancytown, but not Lowtown.
Dotplots for Fancytown and Lowtown
Lowtown
Fancytown
500000
outlier
1000000
1500000
$295,000
295000
The mean can be very sensitive to a few extreme values.
2000000
C
h
a
p
t
e
r

Median
The median of a set of quantitative data is the value
which is located in the middle of the data, arranged
from lowest to highest values (or vice versa), with 50%
of the observations above and 50% below.
Finding the Median, M:

Arrange the n measurements from smallest to largest
If n is odd, M is the middle number
If n is even, M is the average of the middle two
numbers
Highest
value
Lowest
value
50%
Median
50%
11
C
h
a
p
t
e
r
Example of Calculating a Median

House Price in
Fancytown
231,000
285,000
287,000
294,000
297,000
299,000
312,000
The median is
between the
two middle
values
313,000
315,000
317,000
Median, M
297 ,000 299 ,000

$298,000
2
C
h
a
p
t
e
r
Example of Calculating a Median

House Price in
Lowtown
93,000
95,000
97,000
99,000
100,000
110,000
113,000
The median is
between the
two middle
values
121,000
122,000
2,000,000
Median, M
100,000 110,000
$105,000
2
C
h
a
p
t
e
r

Mode
The mode is the most frequently observed value.
The modal class is the midpoint of the class with the
highest relative frequency.
Finding the Mode

Arrange the n measurements from smallest to largest
Count the times each number occurs.
1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10
Mode
14
C
h
a
p
t
e
r
The Center of a Data Set & Distribution

A data set is symmetric if the left half of the
distribution is exactly (or at least approximately) a
mirror image of the right half.
Mean = Median = Mode
2
15
C
h
a
p
t
e
r
The Center of a Data Set & Distribution

Negative (left) Skew
Mean < Median < Mode
Positive (right) Skew
Mean > Median > Mode
2
16
C
h
a
p
t
e
r
Numerical Measures of Variability
Range
Variance
Standard Deviation
spread
2
17
Describing the Variability of a Data Set
C
h
a
p
t
e
r
Range
Equal to the largest measurement minus the smallest
measurement.
Easy to compute, but not very informative
Considers only two observations (smallest / largest)
Monthly Salaries
3,000
2,000
5,000
8,000
5,000
4,000
9,000
Range (R) = 9,000-2,000 = 7,000
18
C
h
a
p
t
e
r
Sample Variance (s2)

For a sample of n measurements is equal to the sum of
the squared distances from the mean, divided by (n-1)
2
2
2
(
x
x
)
(
x
x
)
...
(
x
x
)
2
n
s2 1
n 1
The population variance is denoted by 2

19
C
h
a
p
t
e
r
Sample Standard Deviation (s)

For a sample of n measurements is equal to the square
root of the sample variance.
It tells us, on average, how far each data point
deviates from the mean.
n
s s2
2
(
x
x
)
i
i 1
n 1
The population variance is denoted by

20
C
h
a
p
t
e
r
Example of calculating a variance and

standard deviation
Sample
Dataset
x2
1
2
Sample variance,
( x1 x ) 2 ( x 2 x ) 2 ( x 3 x ) 2
s
n 1
(1 2) 2 (2 2) 2 (3 2) 2
3 1
1
2
Sample mean,
Sample standard deviation,
s2
1 1
C
h
a
p
t
e
r

standard deviation
Calculate the variance and standard deviation for the
information provided:
n 17,
x 12, x
13
2
22
C
h
a
p
t
e
r
2

standard deviation
Calculate the variance and standard deviation for the
information provided:
n 17,
x 12, x
13
Sample variance:
2
2
2
(
12
)
x n
13
17 .283
s2
n 1
17 1
Sample standard deviation:
s2
.283 .532
23
Describing Relative Standing
C
h
a
p
t
e
r
Percentile
For any set of n measurements (arranged in ascending
or descending order), the pth percentile is a number
such that p% of the measurements fall below that
number and 100(1-p)% fall above it.
Upper Quartile (QU or Q3) = 75th percentile
Median (Q2) = 50th percentile
Lower quartile (QL or Q1) = 25th percentile
2
24
Example of Calculating a Percentile
C
h
a
p
t
e
r
Order the dataset:

0,1, 2, 3, 5, 7, 9
Sample
Dataset
3
5
To calculate 50th percentile

Step 1:
n x p = 7 x (.5) =3.5 ~ 4
(round up to next integer)
Step 2:
The 4th location value of the ordered list is
the 50th percentile = 3
0
1
9
2
7
2
25
Example of Calculating a Percentile
C
h
a
p
t
e
r
Sample
Dataset
Order the dataset:

0,1, 2, 3, 5, 7, 9, 10
3
5
To calculate 50th percentile

Step 1:
n x p = 8 x (.5) = 4
0
1
Take the mean of the

4th and 5th values
Step 2:
The mean of the 4th location value and the
5th location value of the ordered list is the
50th percentile = (3+5)/2=4
9
2
7
10
26
Describing the Relative Standing
C
h
a
p
t
e
r
Sample z-score
Tells the distance between a measurement
x and the mean (x ), expressed in terms of
standard deviations.
The sample z-score for a measurement x is
xx
s
2
27
Example of Calculating a z-score
C
h
a
p
t
e
r
Given verbal SAT scores for 2,000 high

school seniors
Mean ( x ) = 550
Standard Deviation (s) = 75
Joe Smiths score = 475

x x 475 550
z
1
s
75
28
Methods for Determining Outliers
C
h
a
p
t
e
r
Outlier
A measurement that is unusually large or
small relative to the other values.
Possible causes:
1. Observation, recording or data entry error
2. Item is from a different population
3. A rare, chance event
2
29
C
h
a
p
t
e
r
Using a boxplot to identify outliers

The box plot is a graph representing information
about certain percentiles for a data set and can be
used to identify outliers. Boxplots
plot the five-number summary
show the spread of the data
detect outliers
2
30
C
h
a
p
t
e
r
The Five-number summary
Lower Quartile
(QL)
Median
Upper Quartile
(QU)
Minimum Value
30
35
Maximum Value
40
45
50
55
BoxPlot
2
31
C
h
a
p
t
e
r
Quartiles and the Interquartile Range
Lower Quartile (QL) = median of the lower half of data set.
Upper Quartile (QU) = median of the upper half of data set.
Interquartile Range (IQR) = upper quartile lower quartile

IQR= QU - QL
2
30
35
40
45
50
55
BoxPlot
32
C
h
a
p
t
e
r
Outliers
An observation is an outlier if it is
< Lower inner fence = QL 1.5 x IQR
> Upper inner fence = QU + 1.5 x IQR
An outlier is extreme if it is
< Lower inner fence = QL 3 x IQR
> Upper inner fence = QU + 3 x IQR
2
33
C
h
a
p
t
e
r
Outliers Example
Student Ages
17
19
19
20
21
22
22
25
18
19
19
20
21
22
23
26
18
19
19
20
21
22
23
28
18
19
19
20
21
22
23
28
18
19
19
20
21
22
23
30
18
19
19
20
21
22
23
37
19
19
20
21
21
22
23
38
19
19
20
21
21
22
24
44
19
19
20
21
21
22
24
47
19
19
20
21
21
22
24
Lower
Quartile
Median
Upper
Quartile
IQR = 22 19 = 3
2
34
C
h
a
p
t
e
r
Outliers Example
Student Ages
17
19
19
20
21
22
22
25
18
19
19
20
21
22
23
26
18
19
19
20
21
22
23
28
18
19
19
20
21
22
23
28
18
19
19
20
21
22
23
30
18
19
19
20
21
22
23
37
19
19
20
21
21
22
23
38
19
19
20
21
21
22
24
44
19
19
20
21
21
22
24
47
19
19
20
21
21
22
24
Lower
Quartile
Median
Upper
Quartile
Outliers
Lower inner fence = 19 (1.5 x 3) = 14.5
Upper inner fence = 22 + (1.5 x 3) = 26.5
35
C
h
a
p
t
e
r
Outliers Example
Student Ages
17
19
19
20
21
22
22
25
18
19
19
20
21
22
23
26
18
19
19
20
21
22
23
28
18
19
19
20
21
22
23
28
Moderate
Outliers
18
19
19
20
21
22
23
30
18
19
19
20
21
22
23
37
19
19
20
21
21
22
23
38
19
19
20
21
21
22
24
44
19
19
20
21
21
22
24
47
19
19
20
21
21
22
24
Lower
Quartile
Median
Upper
Quartile
Extreme
Outliers
Lower inner fence = 19 (3 x 3) = 14.5
Upper inner fence = 22 + (3 x 3) = 26.5
36
C
h
a
p
t
e
r
Outliers Example
Student ages on a boxplot
Mild
Outliers
Smallest data
value not
an outlier
Extreme
Outliers
Largest data
value not
an outlier
37
C
h
a
p
t
e
r
Comparative Boxplot Example
By putting boxplots of two separate groups or

subgroups we can compare their distributional
behaviors.
2
38

Chapter 2 Part 2

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter 2 Part 2

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 2

Summation Notation & Central Tendency

Summation Notation (2.3)

Individual observations in a data set are denoted

We often use a summation symbol

To add all the values of variable x from the first

Other times we will n

Numerical Measures of Center and

Describing the Center of a Data Set

Describing the Center of a Data Set

x (x-bar), and is calculated by

Note that the population mean is denoted by

(mu), and is calculated by

Example of Calculating a Mean

the average or mean

Example of Calculating a Mean

the average or mean

Comparing the two examples

The mean for both Fancytown and Lowtown is

The mean can be very sensitive to a few extreme values.

Describing the Center of a Data Set

Finding the Median, M:

Example of Calculating a Median

297 ,000 299 ,000

Example of Calculating a Median

Describing the Center of a Data Set

Finding the Mode

The Center of a Data Set & Distribution

Mean = Median = Mode

The Center of a Data Set & Distribution

Mean < Median < Mode

Positive (right) Skew

Mean > Median > Mode

Numerical Measures of Variability

Describing the Variability of a Data Set

Range (R) = 9,000-2,000 = 7,000

Describing the Variability of a Data Set

Sample Variance (s2)

The population variance is denoted by 2

Describing the Variability of a Data Set

Sample Standard Deviation (s)

The population variance is denoted by

Example of calculating a variance and

Sample standard deviation,

Example of calculating a variance and

Example of calculating a variance and

Sample standard deviation:

Describing Relative Standing

Example of Calculating a Percentile

Order the dataset:

To calculate 50th percentile

(round up to next integer)

Example of Calculating a Percentile

Order the dataset:

To calculate 50th percentile

Take the mean of the

Describing the Relative Standing

Example of Calculating a z-score

Given verbal SAT scores for 2,000 high

Joe Smiths score = 475

Methods for Determining Outliers

Using a boxplot to identify outliers

plot the five-number summary