Vous êtes sur la page 1sur 9

The School of Economics and Business

Statistics in Economics and Management


Exercise 2

Content of exercises for the course Statistics in Economics and Management


II week

Measures of central tendency and measures of variation (dispersion)


Example 1:
The data on annual gross profit (in million KM) were collected for the 15 commercial companies
in one region,
5
3
2
4
3
6
6
5
4
4
5
4
4
5
3
Determine and explain the measures of central tendency and the measures of variation
(dispersion).
Solution:
Since we have a series with few of modalities, a non interval grouped frequency distribution will
be formed:
xi

fi

2
3
4
5
6

1
3
5
4
2
15

xi X

-2,2
-1,2
-0,2
0,8
1,8

Relative
frequencypi
0,0667
0,2000
0,3333
0,2667
0,1333
1

(x

X ) fi
2

4,84
4,32
0,2
2,56
6,48
18,4

Cumulative
absolute
frequency CAF
1
4
9
13
15

Cumulative
relative
frequency CRF
0,0667
0,2667
0,6
0,8667
1

xi f i

2
9
20
20
12
63

xi2 fi
4
27
80
100
72
283

Since the variable is discrete, a series will be presented by bar chart:

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
Bar chart

number of commercial
companies

6
5
4
3
2
1
0
2

annual gross profit (in m illion KM)

In order to complete our task, we will use the absolute frequencies.


Measures of central tendency:
1. The average (Arithmetic mean)
k

xi f i
X =

i =1

63
= 4, 2
15

The average annual gross profit for the 15 observed companies is 4.2 (million KM)
2. Mod (The most frequent value)
The mode of a data set is the value that occurs with the greatest frequency.
f max = 5 M o = 4

The most frequent gross profit for the 15 observed companies is 4 (million KM)
3. Median

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
To find the median Me, we firstly use the formula for the location (position). The position is
N
= 7,5 . Afterward, we look for the least value of cumulative absolute frequency that is greater
2
or equal to calculated position. The coresponding modality represents median:
N
= 7,5 CAF M e = 9 M e = 4 .
2
CAF M e

9
Due to the large difference between the actual
=
= 0,60 (60%) and theoretical

N
15

(0,5 or 50%) cumulative frequency in our interpretations we use actual cumulative frequency.
Therefore, 60% of companies have gross profit 4 million KM or less, while 40% of the
companies have more than 4 million KM.
4. Quartile
N
= 3,75 CAF Q1 = 4 Q1 = 3
4
In this case, there is no great difference between the actual (26,67%) and theoretical (25%)
cumulative frequency so in our interpretations we use theoretical cumulative frequency.
Therefore, 25% of companies have gross profit 3 million KM or less, while 75% of the
companies have more than 3 million KM.
3 N
= 11,25 CAF Q3 = 13 Q3 = 5
4
CAF Q3 13

Due to the large difference between the actual


=
= 0,8667 (86,67%) and

N
15

theoretical (75%) cumulative frequency, we use actual cumulative frequency in our


interpretations. Therefore, 86.67% of companies have gross profit 5 million KM or less, while
13.33% of the companies have more than 5 million KM.
Measures of dispersion:

1. The range of variation


The range of a data set is the difference between the largest and the smallest data values. It is the
simplest measure of variability. It is very sensitive to the smallest and the largest data values.
RV = x max x min = 6 2 = 4

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
2. The interquartile range
The interquartile range of a data set is the difference between the third quartile and the first
quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme
data values.
I Q = Q3 Q1 = 5 3 = 2
3. Variance and standard deviation
The variance is a measure of variability that utilizes all the data. It is based on the difference
between the value of each observation (xi) and the mean. The variance is the average of the
squared differences between each data value and the mean. If the data set is a sample, the
variance is denoted by s2. If the data set is a population, the variance is denoted by 2.
The standard deviation is a value that measures how far data values are from their mean, on
average. The standard deviation of a data set is the positive square root of the variance. It is
measured in the same units as the data. If the data set is a sample, the standard deviation is
denoted by s. If the data set is a population, the standard deviation is denoted by (sigma).

2 =

1 5
xi X
N i =1

18, 4
fi =
= 1, 23
15

2 = ( xi2 f i ) X =
4, 22 = 1, 23
15
N i =1

283

The average squared deviation of individual gross profit from the average gross profit is 1.23.
(Note that the variance is expressed in a squared unit of variables measure but not interpreted
that way due to the different meaning of the results.)

= 2 = 1, 23 = 1,109
The average linear deviation of the individual gross profit from the average gross profit is
amounted to 1.109 million KM.
4. Coefficient of variation

1,109
V = 100 =
100 = 26, 4%
X
4, 2
Relative variation of data around the arithmetic mean is 26.4%.
5. Coefficient of interquartile deviation
VQ =

IQ
Q3 + Q1

100 =

2
100 = 25%
5+3

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
Relative variation of data around the median is 25%.
Example 2:

The company employs 50 workers. The following data represents the length of service
of employees:
1
4
2
5
6
7
8
7
9
10
18
11
12
13
12
14
14
11
13
15
11
19
7
21
15
8
19
13
9
10
17
22
20
11
14
4
5
17
23
10
12
21
30
4
8

for each
8
10
24
3
12

Determine and explain the measures of central tendency and measures of dispersion.
Solution:
Population - workers (N = 50), variable - length of service (quantitative discrete if it is expressed
as the accumulated years of service, or continuous if it is expressed as a length of time in service
at the time of testing). Now, we have a discrete series with lots of modalities, therefore we will
form an interval grouped frequency distribution.
Firstly, we form a statistical distribution of frequencies. xmin = 1, xmax = 30
Take intervals of amplitude equal to 5, k = 6. Upon request, the intervals are formed with the
closed lower and opened the upper boundary: Ri = [ L1i L2i ) = [ L1i L2i [ in order to get the
right boundaries (not only for the last interval, it is always closed on both sides). In this case
upper boundary of each interval (not only for the last interval) is equal to lower bounary of the
next interval.
Length of service
workers - intervals
with nominal
boundaries
[0-4]
[5-9]
[10-14]
[15-19]
[20-24]
[25-30]

Length of Number
service
of
workers employees
Ri
- fi
6
[0-5[
12
[5-10[
18
[10-15[
7
[15-20[
6
[20-25[
1
[25-30]
50

Center of the
interval
(middle class)
- ci
2,5
7,5
12,5
17,5
22,5
27,5

Cumulative
absolute
frequency CAF
6
18
36
43
49
50

ci f i

ci2 f i

15
90
225
122,5
135
27,5
615

37,5
675
2812,5
2143,75
3037,5
756,25
9462,5

We present the series by histogram and polygon of cumulative frequencies (ogive), but firstly we
must calculate the cumulative frequencies.

The School of Economics and Business


Statistics in Economics and Management
Exercise 2

Number of employees

Histogram
20
15
10
5
0
[0 - 5)

[5 - 10)

[10 - 15)

[15 - 20)

[20 - 25)

[25 - 30)

Intervals

Polygon of cum ulante (ogive)

Cu m u lative freq u en cy

60
50
40
30
20
10
0
0

10

15

20

25

30

upper boundaries

Measures of central tendency:

1. Arithmetic mean
X =

1
N

c
i =1

fi =

1
615 = 12,3
50

The average length of service in a population of 50 workers is 12.3 years.


2. Mod

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
f max = 18 M o [10 15[

M o = L1Mo + l Mo

( f Mo

f Mo f Mo 1
18 12
= 10 + 5
= 11,765
(18 12) + (18 7 )
f Mo 1 ) + ( f Mo f Mo +1 )

The most frequent length of service for the 50 observed employees is 11.765 years.
Graphicaly, the mode will be determined on the histogram.

Number of employees

Histogram
20
15
10
5
0
[0 - 5)

[5 - 10)

Mo

[10 - 15)

[15 - 20)

[20 - 25)

[25 - 30)

Intervals

3. Median and quartile


N
= 25 CAF M e = 36 M e [10 15[ From the interval, the median is determined using
2
linear interpolation:
N
CAF Me 1
25 18
M e = L1Me + l Me 2
= 10 + 5
= 11,94
f Me
18
50% of workers have been in service 11.94 years or less, while 50% workers have been in
service longer than 11.94 years.
N
= 12,5 CAF Q1 = 18 Q1 [5 10[ From the interval, using linear interpolation, we
4
determine the first quartile:

Q1 = L1Q1

N
CAF Q1 1
12,5 6
4
+ lQ1
= 5+ 5
= 7,71
f Q1
12

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
25% of workers have been in service 7.71 years or less, while 75% workers have been in the
service more than 7.71 years.
3 N
= 37,5 CAF Q3 = 43 Q3 [15 20[ From the interval, using linear interpolation we
4
determine the third quartile:

Q3 = L1Q3

3N
CAF Q3 1
37,5 36
4
+ l Q3
= 15 + 5
= 16,07
f Q3
7

75% of workers have been in service 16.07 years or less, while 25% of workers have been in
service longer than 16.07 years.
To determine are calculated measures of central tendency representative , we calculate
measures of variability:

1. The range of variation


RV = xmax xmin = 30 1 = 29
Very poor and unreliable measures.
2. The interquartile range
I Q = Q3 Q1 = 16, 07 7, 75 = 8,32
3. Variance and standard deviation

2 =

1
N

c
i =1

2
i

fi X =

1
9462,5 - 12,3 2 = 37,96
50

The average squared deviation of individual data from the arithmetic mean is 37.96.

= 2 = 37,96 = 6,16
The average linear deviation of individual data from the arithmetic mean is 6.16 years.

4. Coefficient of variation

The School of Economics and Business


Statistics in Economics and Management
Exercise 2
V =

100 =

6,16
100 = 50, 09%
12,3

Relative variation of data around the arithmetic mean amounts 50.09%.


5. Coefficient of interquartile deviation
VQ =

IQ
Q3 + Q1

100 =

8,32
100 = 34,93%
7, 75 + 16, 07

Relative variation of data around the median amounts 34.93%.


The value of relative indicator of variation which uses median as a series representative is lower
than the value of relative indicator which uses arithmetic mean as a series representative.
Therefore, it is better to use median than the arithmetic mean as a data representative.

Vous aimerez peut-être aussi