Vous êtes sur la page 1sur 15

Learning Activity 1: Jigsaw

TLO 1: Summarize a Data Set by Calculating Appropriate Summary Statistics


Learn (20 mins)
Form groups of four.
1
3

2
4

1
3

2
4

1
3

2
4

1
3

2
4

Each group of four is given the Summary Statistics handouts:


1. Measures of Central Tendency
2. Measures of Variation
3. Measures of Position: Percentiles
4. Relative Frequencies
In each group, the handouts are distributed one per person. Each person
completes their assigned reading.
Then, all learners reorganize into a topic-specific group where the assigned
reading is discussed. Topic-specific group members are identified as
specialists on their topic.
1
1

1
1

2
2

2
2

3
3

3
3

4
4

4
4

Teach-back (30 mins)


The specialists move back to their original group to report-out on their
findings in teach-back mode.

Practice (10 mins)


The original groups of four complete an assessment as evidence of learning.

1. Measures of Center
Overview
A measure of central tendency is a single value that describes a set of data
by identifying the center, or middle, of the data. Think of a measure of central
tendency as the "representative" who is elected to represent the data
because it tends to demonstrate the tendency of the set to cluster at that
location. There are three common measures of center -- mean, median, and
mode. Each measure of center has its advantages and disadvantages.
Therefore, it is important to be able to identify the appropriate measure of
center for a given data set.
Upon completion of this unit, you will be able to:
Distinguish between a population and a sample
Calculate measures of center (mean, median, mode) using technology
when provided with a data set
Identify appropriate statistical measurements to summarize a
performance indicator from a given data set
Interpret statistical metrics using the definitions found in the WTCS
Continuous Improvement Indicator Library with correct units of
measure for a given performance indicator

Mean
The mean commonly referred to as the average is the most well-known
measure of central tendency. The mean is calculated by dividing the sum of
all the values in the data set by the number of values in the data set.
In statistics, there is a difference between a sample and a population. The
difference is very important. If the data set you are given consists of all
possible elements being studied, then you have a population. If the data set
is a subset of the elements being studied, then you have a sample. The
mean is calculate the same whether you have a population or a sample. To
acknowledge that you are calculating the population mean use the Greek
lower case letter "mu", denoted as :

1 + 2 + +

If you are calculating a sample mean use x-bar, denoted as :


=

1 + 2 + +

Example 1 Calculate the mean for five students who completed a 100-point
exam with the following outcomes:
50, 75, 80, 90,100.
Solution
Calculate the mean average by adding up the five scores and dividing the
sum by five.
50 + 75 + 80 + 90 + 100 395
=
= 79
5
5
If these are the only five students who will take the exam, this is population
data. You would report that the population mean is 79, = 79. If these are
the scores of five of your students selected from a class of twenty, then you
would report that the sample mean is 79, = 79.
Example 2 Calculate the mean for these test scores:
0

100 100 100 100

Solution
Most students mastered the material and earned a perfect score of 100 so it
would make sense to measure the center of this data set as 100. However,
the mean is affected by that one student who didnt take the exam and
received a grade of 0. By definition, the mean includes that score.
0 + 4(100) 400
=
= 80
5
5
The mean score is 80 which equates to a C. The mean doesnt function well
as a measure of center due to the outlier of 0. In this situation, the median
would be a better measure of central tendency.

Advantages of the Mean


One benefit of the mean is that it includes every value in your data set as part
of the calculation. Therefore, in a well-behaved data set that doesnt have
any unusually small or unusually large values relative to the rest of the data
set, the mean is a good measure of center. It is democratic, allowing each
member of the data set an equal vote.
Disadvantages of the Mean
The mean has one main disadvantage it is particularly susceptible to the
influence of outliers. Outliers are values that are unusually large or small
compared to the rest of the data set.

Median
The median is the middle score for a set of data. To find the median,
arrange the data in order from smallest to largest. Then, pick the value that
sits in the middle. This is the median.
Example 3 Calculate the median for these five scores:
50, 75, 80, 90, 100
Solution
Notice that the scores are already in ascending order. Pick the middle score.
50, 75, 80, 90, 100
80 is the median. Half of the data is smaller than 80; half of the data is larger
than 80.
Note the method of picking the middle value works fine when you have
an odd number of scores, but what happens when you have an even
number of scores?
Example 4 Calculate the median for these six scores:
50, 68, 75, 80, 90, 100
Solution
Notice that the scores are already in ascending order; if they werent already
in order, you would first need to place them in order.

Now there are two middle scores.


50, 68, 75, 80, 90, 100
The median is calculated by adding these two middles together and dividing
by two.
75+80
155
=
= 77.5
2
2
The median is 77.5; half of the values in the data set are less than 77.5 and
half are greater than 77.5.
Advantages of the Median
The main advantage of the median is that it is not affected by outliers in the
data set.
Disadvantages of the Median
The median has one main disadvantage only the one middle value in the
data set has influence on what number is selected. Most of the available
data is ignored by the median. Aside from the middle value (or two middle
values), none of the other values in the data set have a say in what the
medians value should be.

Mode
The mode is the most common or most frequently occurring value in a series
of data. It is used when the data is not numerical but is categorical. For this
reason, the mode is rarely used is data is quantitative.
Example 5 Find the mode for these six scores:
50, 68, 75, 80, 90, 100
Solution
No value occurs any more than any other value. We would say that there is
no mode.

Example 6 Find the mode for these scores:


50, 50, 68, 75, 80, 80, 100

Solution
There are two values that occur twice in this data set: 50 and 80. Therefore,
there are two modes.
Example 7 Find the mode for these scores:
50, 50, 90, 92, 95, 98, 100
Solution
The mode in this data set is 50. However, notice that 50 doesnt seem like a
good representative of the group. Most of the values in the data set are
hovering in the 90s. A good measure of center should reflect that.
Advantages of the Mode
The main advantage of the mode is that it is easy to find just pick the value
that shows up most often. It is the only measure of center for qualitative or
categorical data.
Disadvantages of the Mode
The mode is only useful for data which is categorical. There can also be
more than one mode if there are multiple values in a data set that repeat the
same number of times. If the mode is used in a quantitative data set it is
very possible that the mode doesnt exist. If it does exist due to repetition of
one value, it most likely doesnt represent the data set well.

Mean, Median or Mode?


What measure of center best describes the central tendency of a data set?
The answer lies within the kind of data you have.
If your data set is quantitative data that results from some form of
measurement and there are no outliers, then the mean is the best
choice.
If the data set has outliers, then the median should be used to minimize
the impact of the outliers on the measure of center.
The mode should only be used when you have categorical data.

2. Measures of Variation
Overview
A measure of variation sometimes called a measure of spread or scatter
is used to describe the variability in a sample or population. It is usually
reported along with a measure of central tendency like the mean or median to
provide an overall description of a set of data. Whereas measures of central
tendency summarize similarities in a data set, measures of variation
summarize differences, or fluctuations, in a data set. There are two
measures of variation that commonly occur in statistics range and standard
deviation.
Upon completion of this unit, you will be able to:
Distinguish between a population and a sample
Calculate measures of variation (range, standard deviation) using
technology when provided with a data set
Identify appropriate statistical measurements to summarize a
performance indicator from a given data set
Interpret statistical metrics (standard deviation) with correct units of
measure for a given performance indicator

Why is it important to measure the variation in data?


There is always variation in data, whether the data measures something as
simple as class attendance each day or something as complex as the viability
score for a program at WITC. Variation in data shows up in two forms:
random variation that comes from the sampling techniques used and
variation that is caused by some outside influence. Understanding the nature
of the variation in data is essential in decision-making.
Because variation is normal and constant, data must be plotted over time to
be useful. It is only by plotting data over enough time both before and
after a planned changed is implemented that you can judge whether the
variation noted in the data is random or forms a pattern that indicates that a
meaningful change has occurred.

Range
The range is the easiest measure of variation to calculate. It is the difference
between the highest and lowest scores in a data set. By formula, range is
defined as:
Range = maximum value - minimum value
Example 1 Find the range of the following data set:
10

20

30

40

50

90

Solution
The maximum value is 90 and the minimum value is 10. This results in a
range of 80.
Example 2 Find the range of the following data set:
10

20

30

40

50

290

Solution
The maximum value is 290 and the minimum value is 10. This results in a
range of 280. By changing one value in the data set, the range is changed
markedly.
Advantage of the Range
The range is easy to calculate for any data set.
Disadvantages of the Range
Because it only uses the largest and smallest value in the data set, the range
is affected by outliers values which are unusually large or unusually small
relative to the other values in the data set. The measure is not resistant to
changes made to the data set.

Sample or Population Variation?


The standard deviation is another measure of the variation within a set of
data. Before discussing the details of standard deviation it is important to
distinguish between population and sample data. A population is the
collection of all things being studied for example, all students who graduate
from WITC. However, studying this large, overarching group is difficult. An
attempt was made during the 2014-2015 Community College Survey of
Student Engagement (CCSSE) survey; the survey was completed by 550

respondents far fewer than the number of students enrolled at WITC. The
current enrollment at WITC would be the population of study; the 550
respondents to the CCSSE survey would be a sample.
When we calculate standard deviation, our goal is to find the variation in the
population. However, as we are often presented with data from a sample
only, we are forced to estimate the population standard deviation from a
sample standard deviation. These two standard deviations - sample and
population standard deviations - are calculated differently. For this workshop
we will concentrate on finding sample standard deviation as an estimate of
population standard deviation.

Standard Deviation
Standard deviation measures variation by measuring the average distance
that each data point is from the mean of the data set. The key to
understanding standard deviation is first finding a way to measure each data
points distance from the mean. Then, average those distances. There are
six separate calculations implied in the definition:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:

Calculate the average.


Find the deviation for each data point from the mean.
Square the deviations.
Find the sum of the squared the deviations.
Divide by n 1 (one less than the sample size.)
Take the square root of the average from Step 5.

Example 3 Calculate the standard deviation for 5, 8, 8, 9, 10.


Solution Adding up all five values and dividing by five,

5+8+8+9+10
5

40
5

= 8.

Its easiest to complete the steps within a table when calculating manually. In
most situations, use technology like Microsoft Excel to calculate the standard
deviation easily.
Step 2:
Data Data - Average
5
5 8 = -3
8
88=0
8
88=0
9
98=1
10
10 8 = 2

Step 3:
(Data Average)2
(3)2 = 9
(0)2 = 0
(0)2 = 0
(1)2 = 1
(2)2 = 4

In Step 4, the sum of the squared deviations is 9 + 0 + 0 + 1 + 4 = 14.


Dividing by 5 1 = 4 in Step 5 gives 14/5 = 2.8. This is called the sample
variance. Finally, in Step 6, take the square root to arrive at the standard
deviation,2.8 = 1.67.

When will I use standard deviation?


The standard deviation is an important statistic, but it doesnt get the attention
it deserves. It is common to hear about the average of a data set that is a
measure of the tendency of a data set to cluster about a particular value. But
how often do you hear about standard deviation? Standard deviation is a
measure of diversity; so, when you only hear about the average you are
getting only part of the story. In fact, you could be missing the most
interesting part of the story.
Standard deviation gives an idea of whether the data are close to the
average or whether the data are spread out over a wide range. A low
standard deviation means that most of the numbers are very close to the
average and, so, predictable. A high standard deviation means that the
numbers are spread out and less predictable.
When will you use standard deviation as a
faculty member at WITC? A common place
that you will see standard deviation is in the
column statistics in Blackboards grade book.
An exam with an average of 88.27 and a
standard deviation of 24.24 is very different
than an exam with an average of 88.27 and a
standard deviation of 2.424. In the first case,
you would expect to see wide variation in scores; the average is not a reliable
performance indicator. In the second case, you would expect to see most
exam scores huddled around 88.27; the average is a reliable indicator of
performance.
In fact, standard deviation can be used as a measure of confidence in the
data set. There is a rule of thumb that states that about 95% of all values in a
data set should be within two standard deviations of the average.
Example 4 Exam results show an average score of 80 and a standard
deviation of 5.0. What range of scores would you expect?
Solution About 95% of the exam scores would be between
80 2(5.0) = 70

and

80 + 2(5.0) = 9

3. Measures of Position: Percentiles


Overview
Percentiles are a way to measure the position of a particular value within a
data set. If youre most interested in where a particular value is with respect
to the other values in the set, then thats what a percentile gives you.
Upon completion of this unit, you will be able to:
Calculate measures of relative position (percentiles) using technology
when provided with a data set
Interpret statistical metrics (percentiles) using the definitions found in
the WTCS Continuous Improvement Indicator Library with correct units
of measure for a given performance indicator
Suppose you took an exam and wanted to know how you did relative to
everyone else who took the exam. You may ask about the exams average.
However, knowing an exams average only gives you an idea of which half of
the class you are in the lower half or the upper half. To pinpoint your
location within the class, you would want to know your percentile.
You discover that your score on a recent assessment places you at the 75th
percentile. Thats good news! You did better on the assessment than 75%
of the others that took the exam. For example, in a class of 24 other
students, you scored better than 24 x 75% = 18 students but not as well as
the other 6 students. The percentile gives you an idea of your standing in the
group.
Caution: A percentile is not the percent correct; a percentile is a value that
marks how far a given number is within a data set.
Example 1 Calculate the percentile of the student who scored 30 points out
of 40 possible using the class scores below.
10

15

20

28

30

32

35

36

40

40

Solution First, make sure that all of the scores are in ascending order. Then,
locate the score you are interested in. In this case, that is 30.
Count the scores lower than 30.
10

15

20

28

30

32

35

36

40

40

There are four values lower than 30 out of the ten scores provided. That is
4/10 = 40%. A score of 30 is at the 40th percentile.

Example 2 Using the data from Example 1, calculate the percentile for a
score of 36.
Solution Make sure the data is in ascending order.
10

15

20

28

30

32

35

36

40

40

There are seven scores lower than 36. That is higher than 7/10 = 70% of the
other scores. A score of 36 is at the 70th percentile.

Example 3 In the most current Noel Levitz SSI, WITC scored a 5.1 for
student satisfaction and engagement under the category College
experience met expectations. According to the National Community
College Benchmark Project (NCCBP) WITC was at the 90th percentile.
Interpret the percentile.
Solution
In 2015, relative to student satisfaction with their college experience meeting
their expectations, WITC scored higher than 90% of the other colleges
participating in the NCCBP.

4. Relative Frequencies
Overview
A frequency table is a way to summarize data. Typically, the data is
organized by classes or categories and the number of occurrences for each
class are counted and recorded. When you browse WITCs survey results on
the Connection, you will see the results of surveys expressed as frequency
tables. An example from the 2014-2015 Community College Survey of
Student Engagement (CCSSE) survey shows 550 respondents broken out by
campus.
Campus
Number of Respondents
Ashland
32
New Richmond
235
Rice Lake
194
Superior
89
Total
550
In this table, the number of respondents could also be called the frequency.
In a frequency table, the frequency may be expressed as a percentage of the
total. When the number of outcomes is expressed as a percentage it is
called a relative frequency.
Upon completion of this unit, you will be able to:
Calculate relative frequencies using technology when provided with a
summarized data set
Interpret statistical metrics (relative frequencies) with correct units of
measure for a given performance indicator
A frequency count is the number of values from a data set that fall within a
particular classification. A relative frequency is calculated by dividing a
frequency count by the total sample and converting to a percent. For
example, in the CCSSE data above, there were 235 students from the New
Richmond campus who participated in the survey. To compute the relative
frequency for New Richmond, divide 235 by the total sample size of 550.

235
100% = 42.7%
550
The relative frequency of responses for the New Richmond campus is 42.7%.
Example 1 Calculate the relative frequency for the CCSSE student
responses for the remaining campuses.
Solution
Campus
Ashland

Number of Respondents Relative Frequency


32

32
100% = 5.8%
550
235
100% = 42.7%
550

New Richmond

235

Rice Lake

194

194
100% = 35.3%
550

Superior

89

89
100% = 16.2%
550

Total

550

100%

Example 2 Find the relative frequency (to the nearest tenth percent) for each
letter grade represented in the grade distribution below.
Letter Grade Number of Students
A
1
A2
B+
3
B
4
B1
C+
3
C
0
C4
D
5
F
2
Solution
Letter Grade Number of Students Relative Frequency
A
1
4.0%
A2
8.0%
B+
3
12.0%
B
4
16.0%
B1
4.0%
C+
3
12.0%
C
0
0.0%
C4
16.0%
D
5
20.0%
F
2
8.0%

Vous aimerez peut-être aussi