Académique Documents
Professionnel Documents
Culture Documents
2
4
1
3
2
4
1
3
2
4
1
3
2
4
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1. Measures of Center
Overview
A measure of central tendency is a single value that describes a set of data
by identifying the center, or middle, of the data. Think of a measure of central
tendency as the "representative" who is elected to represent the data
because it tends to demonstrate the tendency of the set to cluster at that
location. There are three common measures of center -- mean, median, and
mode. Each measure of center has its advantages and disadvantages.
Therefore, it is important to be able to identify the appropriate measure of
center for a given data set.
Upon completion of this unit, you will be able to:
Distinguish between a population and a sample
Calculate measures of center (mean, median, mode) using technology
when provided with a data set
Identify appropriate statistical measurements to summarize a
performance indicator from a given data set
Interpret statistical metrics using the definitions found in the WTCS
Continuous Improvement Indicator Library with correct units of
measure for a given performance indicator
Mean
The mean commonly referred to as the average is the most well-known
measure of central tendency. The mean is calculated by dividing the sum of
all the values in the data set by the number of values in the data set.
In statistics, there is a difference between a sample and a population. The
difference is very important. If the data set you are given consists of all
possible elements being studied, then you have a population. If the data set
is a subset of the elements being studied, then you have a sample. The
mean is calculate the same whether you have a population or a sample. To
acknowledge that you are calculating the population mean use the Greek
lower case letter "mu", denoted as :
1 + 2 + +
1 + 2 + +
Example 1 Calculate the mean for five students who completed a 100-point
exam with the following outcomes:
50, 75, 80, 90,100.
Solution
Calculate the mean average by adding up the five scores and dividing the
sum by five.
50 + 75 + 80 + 90 + 100 395
=
= 79
5
5
If these are the only five students who will take the exam, this is population
data. You would report that the population mean is 79, = 79. If these are
the scores of five of your students selected from a class of twenty, then you
would report that the sample mean is 79, = 79.
Example 2 Calculate the mean for these test scores:
0
Solution
Most students mastered the material and earned a perfect score of 100 so it
would make sense to measure the center of this data set as 100. However,
the mean is affected by that one student who didnt take the exam and
received a grade of 0. By definition, the mean includes that score.
0 + 4(100) 400
=
= 80
5
5
The mean score is 80 which equates to a C. The mean doesnt function well
as a measure of center due to the outlier of 0. In this situation, the median
would be a better measure of central tendency.
Median
The median is the middle score for a set of data. To find the median,
arrange the data in order from smallest to largest. Then, pick the value that
sits in the middle. This is the median.
Example 3 Calculate the median for these five scores:
50, 75, 80, 90, 100
Solution
Notice that the scores are already in ascending order. Pick the middle score.
50, 75, 80, 90, 100
80 is the median. Half of the data is smaller than 80; half of the data is larger
than 80.
Note the method of picking the middle value works fine when you have
an odd number of scores, but what happens when you have an even
number of scores?
Example 4 Calculate the median for these six scores:
50, 68, 75, 80, 90, 100
Solution
Notice that the scores are already in ascending order; if they werent already
in order, you would first need to place them in order.
Mode
The mode is the most common or most frequently occurring value in a series
of data. It is used when the data is not numerical but is categorical. For this
reason, the mode is rarely used is data is quantitative.
Example 5 Find the mode for these six scores:
50, 68, 75, 80, 90, 100
Solution
No value occurs any more than any other value. We would say that there is
no mode.
Solution
There are two values that occur twice in this data set: 50 and 80. Therefore,
there are two modes.
Example 7 Find the mode for these scores:
50, 50, 90, 92, 95, 98, 100
Solution
The mode in this data set is 50. However, notice that 50 doesnt seem like a
good representative of the group. Most of the values in the data set are
hovering in the 90s. A good measure of center should reflect that.
Advantages of the Mode
The main advantage of the mode is that it is easy to find just pick the value
that shows up most often. It is the only measure of center for qualitative or
categorical data.
Disadvantages of the Mode
The mode is only useful for data which is categorical. There can also be
more than one mode if there are multiple values in a data set that repeat the
same number of times. If the mode is used in a quantitative data set it is
very possible that the mode doesnt exist. If it does exist due to repetition of
one value, it most likely doesnt represent the data set well.
2. Measures of Variation
Overview
A measure of variation sometimes called a measure of spread or scatter
is used to describe the variability in a sample or population. It is usually
reported along with a measure of central tendency like the mean or median to
provide an overall description of a set of data. Whereas measures of central
tendency summarize similarities in a data set, measures of variation
summarize differences, or fluctuations, in a data set. There are two
measures of variation that commonly occur in statistics range and standard
deviation.
Upon completion of this unit, you will be able to:
Distinguish between a population and a sample
Calculate measures of variation (range, standard deviation) using
technology when provided with a data set
Identify appropriate statistical measurements to summarize a
performance indicator from a given data set
Interpret statistical metrics (standard deviation) with correct units of
measure for a given performance indicator
Range
The range is the easiest measure of variation to calculate. It is the difference
between the highest and lowest scores in a data set. By formula, range is
defined as:
Range = maximum value - minimum value
Example 1 Find the range of the following data set:
10
20
30
40
50
90
Solution
The maximum value is 90 and the minimum value is 10. This results in a
range of 80.
Example 2 Find the range of the following data set:
10
20
30
40
50
290
Solution
The maximum value is 290 and the minimum value is 10. This results in a
range of 280. By changing one value in the data set, the range is changed
markedly.
Advantage of the Range
The range is easy to calculate for any data set.
Disadvantages of the Range
Because it only uses the largest and smallest value in the data set, the range
is affected by outliers values which are unusually large or unusually small
relative to the other values in the data set. The measure is not resistant to
changes made to the data set.
respondents far fewer than the number of students enrolled at WITC. The
current enrollment at WITC would be the population of study; the 550
respondents to the CCSSE survey would be a sample.
When we calculate standard deviation, our goal is to find the variation in the
population. However, as we are often presented with data from a sample
only, we are forced to estimate the population standard deviation from a
sample standard deviation. These two standard deviations - sample and
population standard deviations - are calculated differently. For this workshop
we will concentrate on finding sample standard deviation as an estimate of
population standard deviation.
Standard Deviation
Standard deviation measures variation by measuring the average distance
that each data point is from the mean of the data set. The key to
understanding standard deviation is first finding a way to measure each data
points distance from the mean. Then, average those distances. There are
six separate calculations implied in the definition:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
5+8+8+9+10
5
40
5
= 8.
Its easiest to complete the steps within a table when calculating manually. In
most situations, use technology like Microsoft Excel to calculate the standard
deviation easily.
Step 2:
Data Data - Average
5
5 8 = -3
8
88=0
8
88=0
9
98=1
10
10 8 = 2
Step 3:
(Data Average)2
(3)2 = 9
(0)2 = 0
(0)2 = 0
(1)2 = 1
(2)2 = 4
and
80 + 2(5.0) = 9
15
20
28
30
32
35
36
40
40
Solution First, make sure that all of the scores are in ascending order. Then,
locate the score you are interested in. In this case, that is 30.
Count the scores lower than 30.
10
15
20
28
30
32
35
36
40
40
There are four values lower than 30 out of the ten scores provided. That is
4/10 = 40%. A score of 30 is at the 40th percentile.
Example 2 Using the data from Example 1, calculate the percentile for a
score of 36.
Solution Make sure the data is in ascending order.
10
15
20
28
30
32
35
36
40
40
There are seven scores lower than 36. That is higher than 7/10 = 70% of the
other scores. A score of 36 is at the 70th percentile.
Example 3 In the most current Noel Levitz SSI, WITC scored a 5.1 for
student satisfaction and engagement under the category College
experience met expectations. According to the National Community
College Benchmark Project (NCCBP) WITC was at the 90th percentile.
Interpret the percentile.
Solution
In 2015, relative to student satisfaction with their college experience meeting
their expectations, WITC scored higher than 90% of the other colleges
participating in the NCCBP.
4. Relative Frequencies
Overview
A frequency table is a way to summarize data. Typically, the data is
organized by classes or categories and the number of occurrences for each
class are counted and recorded. When you browse WITCs survey results on
the Connection, you will see the results of surveys expressed as frequency
tables. An example from the 2014-2015 Community College Survey of
Student Engagement (CCSSE) survey shows 550 respondents broken out by
campus.
Campus
Number of Respondents
Ashland
32
New Richmond
235
Rice Lake
194
Superior
89
Total
550
In this table, the number of respondents could also be called the frequency.
In a frequency table, the frequency may be expressed as a percentage of the
total. When the number of outcomes is expressed as a percentage it is
called a relative frequency.
Upon completion of this unit, you will be able to:
Calculate relative frequencies using technology when provided with a
summarized data set
Interpret statistical metrics (relative frequencies) with correct units of
measure for a given performance indicator
A frequency count is the number of values from a data set that fall within a
particular classification. A relative frequency is calculated by dividing a
frequency count by the total sample and converting to a percent. For
example, in the CCSSE data above, there were 235 students from the New
Richmond campus who participated in the survey. To compute the relative
frequency for New Richmond, divide 235 by the total sample size of 550.
235
100% = 42.7%
550
The relative frequency of responses for the New Richmond campus is 42.7%.
Example 1 Calculate the relative frequency for the CCSSE student
responses for the remaining campuses.
Solution
Campus
Ashland
32
100% = 5.8%
550
235
100% = 42.7%
550
New Richmond
235
Rice Lake
194
194
100% = 35.3%
550
Superior
89
89
100% = 16.2%
550
Total
550
100%
Example 2 Find the relative frequency (to the nearest tenth percent) for each
letter grade represented in the grade distribution below.
Letter Grade Number of Students
A
1
A2
B+
3
B
4
B1
C+
3
C
0
C4
D
5
F
2
Solution
Letter Grade Number of Students Relative Frequency
A
1
4.0%
A2
8.0%
B+
3
12.0%
B
4
16.0%
B1
4.0%
C+
3
12.0%
C
0
0.0%
C4
16.0%
D
5
20.0%
F
2
8.0%