Vous êtes sur la page 1sur 29

1

Tendencia central y dispersin


Tendencia central y dispersin
de una distribucin
de una distribucin
2
Review Topics

Measures of Central Tendency


Mean, Median, Mode

Quartile

Measures of Variation
The Range, Variance and
Standard Deviation, Coefficient of variation

Shape
Symmetric, Skewed
3
Important Summary Measures
Central Tendency
Mean
Median
Mode
Quartile
One sample
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
4
Measures of Central Tendency
Central Tendency
Mean
Median Mode
n
x
n
i
i

1
Data: You can access practice sample data on
HMO premiums here.
5
With one data point
clearly the central
location is at the point
itself.
But if the third data point
appears on the left hand-side
of the midrange, it should pull
the central location to the left.
Measures of Central
Location (Tendency)

Usually, we focus our attention on two


aspects of measures of central location:

Measure of the central data point (the average).

Measure of dispersion of the data about the


average. With two data points,
the central location
should fall in the middle
between them (in order
to reflect the location of
both of them).
If the third data point appears
exactly in the middle of the
current range, the central
location should not change
(because it is currently
residing in the middle).
6
n
x
x
i
n
1 i

This is the most popular and useful measure of


central location
Sum of the measurements
Number of measurements
Mean =
Sample mean Population mean
N
x
i
N
1 i


Sample size Population size
n
x
x
i
n
1 i

l
Arithmetic
Arithmetic
mean
mean
7

+ + + + +


6 6
6 5 4 3 2 1
6
1
x x x x x x x
x
i i

Example 4.1
The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by
7
7
3
3
9
9
4
4
6
6
4.5
4.5

Example 4.2
Suppose the telephone bills of example 2.1 represent population
of measurements. The population mean is

+ + +



200
x ... x x
200
x
200 2 1 i
200
1 i
42.19
42.19
15.30
15.30
53.21
53.21
43.59
43.59
2
8
26,26,28,29,30,32,60,31
Odd number of observations
26,26,28,29,30,32,60
Example 4.4
Seven employee salaries were recorded
(in 1000s) : 28, 60, 26, 32, 30, 26, 29.
Find the median salary.

The median of a set of measurements is the


value that falls in the middle when the
measurements are arranged in order of
magnitude.
Suppose one employees salary of $31,000
was added to the group recorded before.
Find the median salary.
Even number of observations
26,26,28,29, 30,32,60,31 26,26,28,29, 30,32,60,31
There are two middle values!
First, sort the salaries.
Then, locate the value
in the middle
First, sort the salaries.
Then, locate the values
in the middle
26,26,28,29, 30,32,60,31 29.5,
l
The median
The median
9

The mode of a set of measurements is the value


that occurs most frequently.

Set of data may have one mode (or modal class),


or two or more modes.
The modal class
For large data sets
the modal class is
much more relevant
than the a single-
value mode.
l
The mode
The mode
10

Example 4.6
A professor of statistics wants to report the results of a midterm
exam, taken by 100 students. The data appear in file XM04-06.
Find the mean, median, and mode, and describe the information
they provide.
Marks
Mean 73.98
Standard Error 2.1502163
Median 81
Mode 84
Standard Deviation 21.502163
Sample Variance 462.34303
Kurtosis 0.3936606
Skewness -1.073098
Range 89
Minimum 11
Maximum 100
Sum 7398
Count 100
The mean provides information
about the over-all performance level
of the class.
The Median indicates that half of the
class received a grade below 81%,
and half of the class received a grade
above 81%. The mode must be used when data is
qualitative. If marks are classified by
letter grade, the frequency of each
grade can be calculated.Then, the mode
becomes a logical measure to compute.
Excel Results
11
Relationship among Mean, Median,
Relationship among Mean, Median,
and Mode
and Mode

If a distribution is symmetrical, the


mean, median and mode coincide

If a distribution is non symmetrical, and


skewed
to the left or to the right, the three
measures
differ.
A positively skewed distribution
(skewed to the right)
Mean
Median
Mode
12
`
`

If a distribution is symmetrical, the mean,


median and mode coincide

If a distribution is non symmetrical, and


skewed to the left or to the right, the three
measures differ.
A positively skewed distribution
(skewed to the right)
Mean
Median
Mode
Mean
Median
Mode
A negatively skewed distribution
(skewed to the left)
13
Measures of Variation
Variation
Variance Standard Deviation Coefficient of
Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
Interquartile Range
100%

,
_

X
S
CV
14
Measures of variability
Measures of variability
(Looking beyond the average)
(Looking beyond the average)

Measures of central location fail to tell the


whole story about the distribution.

A question of interest still remains unanswered:


How typical is the average value of all
the measurements in the data set?
How much spread out are the measurements
about the average value?
or
15
Observe two hypothetical data sets
The average value provides
a good representation of the
values in the data set.
Low variability data set
High variability data set
The same average value does not
provide as good presentation of the
values in the data set as before.
This is the previous
data set. It is now
changing to...
16

The range of a set of measurements is the difference


between the largest and smallest measurements.

Its major advantage is the ease with which it can be


computed.

Its major shortcoming is its failure to provide


information on the dispersion of the values between
the two end points.
? ? ?
But, how do all the measurements spread out?
Smallest
measurement
Largest
measurement
The range cannot assist in answering this question
Range
l
The range
The range
17

This measure of dispersion reflects the values of


all the measurements.

The variance of a population of N


measurements
x
1
, x
2
,,x
N
having a mean is defined as

The variance of a sample of n measurements


x
1
, x
2
, ,x
n
having a mean is defined as
N
) x (
2
i
N
1 i
2



x
1 n
) x x (
s
2
i
n
1 i
2


l
The variance
The variance
18
Consider two small populations:
Population A: 8, 9, 10, 11, 12
Population B: 4, 7, 10, 13, 16
10 9 8
7 4 10
11 12
13 16
8-10= -2
9-10= -1
11-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
The mean of both
populations is 10...
but measurements in B
are much more dispersed
then those in A.
Thus, a measure of dispersion
is needed that agrees with this
observation.
Let us start by calculating
the sum of deviations
A
B
The sum of deviations
is zero in both cases,
therefore, another
measure is needed.
19
10 9 8
7 4 10
11 12
13 16
8-10= -2
9-10= -1
11-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
A
B
The sum of deviations
is zero in both cases,
therefore, another
measure is needed.
The sum of squared deviations
is used in calculating the variance.
See example next.
20
Let us calculate the variance of the two populations
18
5
) 10 16 ( ) 10 13 ( ) 10 10 ( ) 10 7 ( ) 10 4 (
2 2 2 2 2
2
B

+ + + +

2
5
) 10 12 ( ) 10 11 ( ) 10 10 ( ) 10 9 ( ) 10 8 (
2 2 2 2 2
2
A

+ + + +

Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of
dispersion instead?
After all, the sum of squared
deviations increases in
magnitude when the dispersion
of a data set increases!!
21

Example 4.8

Find the mean and the variance of the following


sample of measurements (in years).
3.4, 2.5, 4.1, 1.2, 2.8, 3.7

Solution

1
1
]
1

n
) x (
x
1 n
1
1 n
) x x (
s
2
i
n
1 i
2
i
n
1 i
2
i
n
1 i
2
95 . 2
6
7 . 17
6
7 . 3 8 . 2 2 . 1 1 . 4 5 . 2 4 . 3
6
x
x
i
6
1 i

+ + + + +


A shortcut formula
=[3.4
2
+2.5
2
++3.7
2
]-[(17.7)
2
/6] = 1.075 (years)
2
22
Sample Standard Deviation
1
2

n
X X
i
For the Sample : use n - 1
in the denominator.
Data: 10 12 14 15 17 18 18 24
s =


n = 8 Mean =16
1 8
16 24 16 18 16 17 16 15 16 14 16 12 16 10
2 2 2 2 2 2 2

+ + + + + + ) ( ) ( ) ( ) ( ) ( ) ( ) (
= 4.2426
s
: X
i
23
Interpreting Standard
Interpreting Standard
Deviation
Deviation

The standard deviation can be used to

compare the variability of several distributions

make a statement about the general shape of a


distribution.

The empirical rule: If a sample of measurements


has a mound-shaped distribution, the interval
ts measuremen the of 68% ely approximat contains ) s x , s x ( +
ts measuremen the of 95% ely approximat contains ) s 2 x , s 2 x ( +
ts measuremen the of all virtually contains ) s 3 x , s 3 x ( +
24
Comparing Standard Deviations
1
2


n
X X
i s =

= 4.2426
N
X
i

= 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24 : X
i
N= 8 Mean =16
25
Comparing Standard Deviations
Mean = 15.5
s = 3.338

11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
26
Measures of Association
Measures of Association

Two numerical measures are presented, for


the description of linear relationship
between two variables depicted in the
scatter diagram.

Covariance - is there any pattern to the way two


variables move together?

Correlation coefficient - how strong is the linear


relationship between two variables
27
N
) y )( (x
Y) COV(X, covariance Population
y i x i

x
(
y
)

is the population mean of the variable X (Y)
N is the population size. n is the sample size.
1 - n
) y )( (x
Y) cov(X, covariance Sample
y i x i


l
The
The
covariance
covariance
28

This coefficient answers the question: How strong


is the association between X and Y.
y x
) Y , X ( COV

n correlatio of t coefficien Population


y x
s s
) Y , X cov(
r
n correlatio of t coefficien Sample

l
The coefficient of correlation
The coefficient of correlation
29
COV(X,Y)=0
or r =
+1
0
-1
Strong positive linear relationship
No linear relationship
Strong negative linear relationship
or
COV(X,Y)>0
COV(X,Y)<0

Vous aimerez peut-être aussi