Académique Documents
Professionnel Documents
Culture Documents
Description of Data
Learning Objectives:
Given the learning materials and activities of this chapter, they will be able to:
Calculate the mean, weighted mean, median, and determine the mode to described the
center of a set of data.
Calculate the range, variance, mean deviation, and standard deviation.
Describe the shape of the distribution using measures of skewness and kurtosis
Distinguish the appropriate descriptive measure of data and determine their usability and
limitations.
∑𝑛
𝑖=1 𝑋𝑖
𝑥= – sample mean
𝑛
Example 1: Five judges give their scores on the performance of a gymnast as follows:8, 9, 9, 9,
and 10. Find the mean score of the gymnast.
Solution: Let x be the score given by the ith judges in the population. Add the scores given by the
5 judges. Thus, the mean score is;
∑𝑛𝑖=1 𝑋𝑖 8 + 9 + 9 + 9 + 10 45
𝜇= = = = 9.
𝑁 5 5
Therefore, the mean score of the gymnast is 9 units.
Mean from Grouped Data
We can estimate the value of the average in the form of a grouped frequency distribution
table. If the original raw data could not be accessible, then it is still possible to approximate its
value using the formula given below.
∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖
𝑥=
𝑛
where;
𝑓𝑖 – the frequency of the ith class
𝑥𝑖 – the class mark of the ith class
k – the total number of classes/class intervals
Example 2: The following table presents the frequency distribution of the weight of 75 pieces of
luggage in pounds. Approximate the sample mean weight of the luggage.
Weight (pounds) No. of Luggage (f) Class Mark (x) fx
31.5 – 41.4 9 36.45 328.05
41.5 – 51.4 8 46.45 371.60
51.5 – 61.4 4 56.45 225.80
61.5 – 71.4 32 66.45 2126.40
71.5 – 81.4 14 76.45 1070.30
81.5 – 91.4 5 86.45 432.25
91.5 – 101.4 3 96.45 289.35
total 75 4843.75
Substitute the values to the formula and the mean weight value is
∑7𝑖=1 𝑓𝑖 𝑥𝑖 4843.75
𝑥= = = 64.58 𝑝𝑜𝑢𝑛𝑑𝑠
𝑛 75
Thus, the mean weight of the luggage is approximately 64.58 pounds.
Weighted Mean
The weighted mean is used when the individual observed values vary in their degree of
importance. This can be done by assigning the weights of the observations depending on their
relative importance.
If we assign a weight w to each observation x, then the weighted mean is given by
∑ 𝑤𝑥
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 =
∑𝑤
Example 3: A government agency gives scholarship grants to employees taking graduate studies.
Courses in graduate studies earn credits of 1, 2, 3, 4, or 5 units. They can get a partial scholarship
for the next semester if they get an average of 1.5 to 1.75 and a full scholarship if the average is
better than 1.5. What kind of scholarship will the 2 employees get given their grades for the
previous semester? The data is given below:
Employee A Employee B
Subjects units grade Subjects units grade
A 1 1.0 A 1 2.0
B 2 1.25 B 2 1.75
C 3 1.5 C 3 1.5
D 4 1.75 D 4 1.25
E 5 2.0 E 5 1.0
The average grade of employee A is:
1(1.0) + 2(1.25) + 3(1.5) + 4(1.75) + 5(2.0) 25
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 = = = 1.67
1+2+3+4+5 15
The average grade of employee B is:
Therefore, it means that companies with total receipt of less than 7.3 million pesos belong in the
lower half of the ordered observations.
Median from Grouped Data
The approximate median from frequency distribution is obtained utilizing the given
formula below:
𝑛
− 𝑐𝑓<𝑚𝑐
𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑙𝑚𝑐 + (𝑤) [2 ]
𝑓𝑚𝑐
Where;
n – the total frequency
𝑐𝑓<𝑚𝑐 – cumulative frequency of the class preceding the median class
𝑓𝑚𝑐 – frequency of the median class
𝑙𝑚𝑐 – lower boundary of the median class
w – the class size/class width
Example 5: The following table presents the frequency distribution of the weight of 75 pieces of
luggage in pounds. Approximate the sample median weight of the luggage.
Weight (pounds) No. of Luggage (f) less than cumulative
frequency (<cf)
31.5 – 41.4 9 9
41.5 – 51.4 8 17
51.5 – 61.4 4 21 𝑐𝑓<𝑚𝑐
61.5 – 71.4 Median class 32 𝑓𝑚𝑐 53
71.5 – 81.4 14 67
81.5 – 91.4 5 72
91.5 – 101.4 3 75
total 75
𝑘(𝑛+1)
If 𝑝𝑘 = is not an integer then the weighted average estimate makes use of simple
100
interpolation between the two observed values, using the formula below:
𝑝𝑘 = (1 − 𝑚)𝑋𝑖 + 𝑚𝑋(𝑖+1)
Where;
m – is the fractional part
i – is the integer part
k – the desired location
Example 1: The following data sets are the number of years of operation of 20 mining
companies: 4, 6, 7, 5, 6, 30, 23, 25, 20, 21, 17, 18, 17, 19, 11, 10, 10, 8, 20, 16. Determine the
95th percentile.
Solution: Arrange the data set in order.
4, 5, 6, 6, 7, 8, 10, 10, 11, 16, 17, 17, 18, 19, 20, 20, 21, 23, 25, 30
95(20+1)
Compute for 𝑝95 = = 19.95. Since, 𝑝95 is not an integer, then the 𝑝95 is computed by
100
Therefore, we can say that 95 percent of the 20 mining companies have been operating for less
than 29.75 years.
Percentile from Grouped Data
To approximate the kth percentile, 𝑝𝑘 , we used this formula given below:
𝑛𝑘
− 𝑐𝑓<𝑝𝑘
𝑃𝑘 = 𝑙𝑝𝑘 + (𝑤) [ 100 ]
𝑓𝑝𝑘
Where;
n – the total frequency
𝑐𝑓<𝑝𝑘 – cumulative frequency of the class preceding the 𝑝𝑘 class
𝑘(𝑛+1)
If 𝐷𝑘 = is not an integer then the weighted average estimate makes use of simple
10
interpolation between the two observed values, using the formula below:
𝐷𝑘 = (1 − 𝑚)𝑋𝑖 + 𝑚𝑋(𝑖+1)
Where;
m – is the fractional part
i – is the integer part
k – the desired location
Decile from Grouped Data
To approximate decile from frequency distribution table, the formula is:
𝑛𝑘
− 𝑐𝑓<𝐷𝑘
𝐷𝑘 = 𝑙𝐷𝑘 + (𝑤) [10 ]
𝑓𝐷𝑘
Where;
n – the total frequency
𝑐𝑓<𝐷𝑘 – cumulative frequency of the class preceding the 𝐷𝑘 class
Quartile
The quartile divides the ordered observations into four (4) equal parts. The formula to
compute quartile is:
𝑘(𝑛+1)
𝑄𝑘 = – Weighted average estimate method
4
𝑘(𝑛+1)
If 𝑄𝑘 = is not an integer then the weighted average estimate makes use of simple
10
interpolation between the two observed values, using the formula below:
𝑄𝑘 = (1 − 𝑚)𝑋𝑖 + 𝑚𝑋(𝑖+1)
Where;
m – is the fractional part
I – is the integer part
k – the desired location
Decile from Grouped Data
To approximate decile from frequency distribution table, the formula is:
𝑛𝑘
− 𝑐𝑓<𝑄𝑘
𝑄𝑘 = 𝑙𝑄𝑘 + (𝑤) [10 ]
𝑓𝑄𝑘
Where;
n – the total frequency
𝑐𝑓<𝑄𝑘 – cumulative frequency of the class preceding the 𝑄𝑘 class
∑(𝑥−𝑥)2
Sample standard deviation 𝑠=√ and the population standard
𝑛−1
deviation
∑(𝑥 − 𝜇)2
𝜎=√
𝑁
Variance
The variance is the positive square of the standard deviation.
Example 3: The final scores of 5 students were recorded as follows: 80, 88, 92, 90 and 85.
Determine the variance and standard deviation.
Solution: Consider the scores as x’s and compute the average.
scores (x) (x – mean) (x – mean)2
80 80 – 87 = -7 49
88 88 – 87 =1 1
92 92 – 87 =5 25
90 90 – 87 =3 9
85 85 – 87 = -2 4
∑(𝑥 − 𝑥)2 = 88
Steps:
80+88+92+90+85
1. Calculate the average score, the average score is = 87.
5
2. Calculate the deviation from the mean (x-mean)
3. Take the square of the deviation from the mean.
4. Take the sum of the square of the deviation from the mean.
5. Substitute the values to the formula.
88
𝑠 = √5−1 = 4.69 units.
Therefore, on average the individual observed values deviate around 4.69 units away from the
mean.
Standard Deviation from Grouped Data
The procedure is similar to that of finding the mean for grouped data, and it uses the
midpoints of each class. The formula is
𝑛 ∑ 𝑓. 𝑥 2 − (∑ 𝑓. 𝑥)2
𝑠=√
𝑛(𝑛 − 1)
Example 4: For 108 randomly selected high school students, the following IQ frequency
distribution table were obtained.
IQ interval Frequency
90 – 98 6
99 – 107 22
108 – 116 43
117 – 125 28
126 – 134 9
Find the variance and standard deviation.
Solution: Make a table. Find the class marks/class midpoint of each class. Multiply the midpoints
by the frequency for each class. Take the square of the midpoint for each class. Multiply the
frequency by the square of the midpoint for each class. Lastly, find the sum of the frequency, the
sum between the product of the frequency and class midpoints and the sum between the product
of the frequency and the square of the class marks. Finally, substitute the values to the formula.
IQ interval Frequency (f) class f.x 𝑥2 f.𝑥 2
mark/class
midpoint (x)
90 – 98 6 94 564 8836 53,061
99 – 107 22 103 2266 10609 233,398
108 – 116 43 112 4816 12544 539,392
117 – 125 28 121 3388 14641 409,948
126 – 134 9 130 1170 16900 152100
total n=108 ∑ 𝑓𝑥 ∑ 𝑓𝑥 2
= 12,204 = 1,387,854
108(1,387,854)−12,204
𝑠=√ = 9.07.
(108)(107)
18
𝑀𝐴𝐷 = =3
6
Therefore, on average each score deviates around 3 units away from the mean.
Mean Absolute Deviation from Grouped Data
When the data is presented in the frequency distribution table, then the mean absolute
deviation is computed in the following manner.
∑ 𝑓|𝑥 − 𝑚𝑒𝑎𝑛|
𝑚𝑒𝑎𝑛 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
𝑛
Where;
f – the frequency of each class
x – the class mark of each class
n – the total frequency
The example on how to compute mean absolute deviation from grouped data is leave as exercises.
Coefficient of Variation
The measures of variability like standard deviation is measures of absolute variability and
not the relative dispersion. It can only compare two samples data sets that have the same units of
measure. However, it is customary that two samples have varied on their units of measurement, to
overcome this problem coefficient of variation is a solution. The formula is
𝑠
𝑐𝑣 = ∗ 100%
𝑚𝑒𝑎𝑛
Where;
s – the sample standard deviation
cv – the coefficient of variation
Example 1: The average score of the students in Algebra class is 110, with a standard deviation
of 5; while the average score of students in a Biology class is 106, with a standard deviation of 4.
Which class is more variable in terms of score?
Solution: Calculate the coefficient of variation of each class.
Algebra class:
5
𝑐𝑣 = ∗ 100% = 4.55%
110
Biology class:
4
𝑐𝑣 = ∗ 100% = 3.77%
106
Since the coefficient of variation for the algebra class is larger compared to Biology class, thus,
the scores of students in Algebra class are more variable than the scores in the Biology class.
Measures of Skewness
A measure of skewness is a single value that indicates the degree and direction of
asymmetry. The computed value of skewness may be interpreted as follows:
sk = 0 – symmetric distribution
sk >= 0 – positively skewed distribution
sk <=0 – negatively skewed distribution