Vous êtes sur la page 1sur 90
Chapter two Examples
Chapter two
Examples
Example one
Example one

Example 2

  • 20 pre-school children has been tested their HB level here is

their results, construct ungrouped frequency distribution.

11

15

10

7

9

8

  • 16 6

  • 15 12

9

15

  • 12 14

  • 13 16

  • 14 13

7

10

Example 3

The blood glucose level, in milligrams per deciliter, for 30

patients is shown below. Construct a frequency distribution for the data set, using six classes.

  • 55 115

111

  • 63 97

90

  • 84 81

82

Time series graph Examples
Time series graph
Examples
US labor force date 1960 1970 1980 1990 2000 2008 women 11 10 8 9 11
US labor force
date
1960
1970
1980
1990
2000
2008
women
11
10
8
9
11
15
men
34
28
18
15
17
22
Chapter 3 Examples
Chapter 3
Examples
The mean
The mean
Mean of grouped data
Mean of grouped data
The median
The median
The mode in a grouped data  Mode = mean-3(mean-medium)
The mode in a grouped data
Mode = mean-3(mean-medium)
Chapter 4 Examples
Chapter 4
Examples
Range  The range for a set of data items is the difference between the largest
Range
The range for a set of data items is the difference between the
largest and smallest
values.
Although the range is the easiest of the numerical measures
of variability to compute, it is not widely used because it is
based on only two of the items in the data set and thus is
influenced too much by extreme data values.
Range = max - min
Interquartile Range  A form of the range that avoids the dependence on extreme values in
Interquartile Range
A form of the range that avoids the dependence on extreme
values in the data set is the interquartile range (IQR), or Q-
spread.
This descriptive measure of variability is simply the
difference between the third quartile , or 75%-tile data item,
and the first quartile , or 25%-tile data item.
In effect, it is showing the range for the middle 50% of the
data and, as such, is not affected by the extreme values in the
data set
  • IQR = Q3-Q1

  • Q1= ¼ N

  • Q3= ¾ N

Exmaple one

I. The following are 25 final averages in a math class:

  • 46 72

64

79

89

  • 49 74

66

79

91

  • 53 75

66

80

94

  • 60 76

67

83

95

  • 61 79

71

88

98

What is the range? What is the interquartile range?

Average Absolute Deviation from the Mean

  • Obviously, there are limitations in using range or interquartile range as measures of variability.

  • It would seem reasonable that any useful measure of variability should

  • measure the spread around the mean since the mean is the “balance point” of a distribution.

  • If you find the difference between each data item and the mean, you will get negative values for items that are less than the mean and positive values for items greater than the mean.

  • If you then sum up all of these differences, you will get zero; this illustrates a special property of the mean.

  • However, by taking the absolute value of each difference, you will get

the distance of each item from the mean, and the sum of these

distances would measure the total spread around the mean

 If you were to include more data items, equally spread around the mean, you would
If you were to include more data items, equally spread around
the mean, you would increase the total of the distances even
though the new distribution might be less variable.
Therefore, it is important to divide the total absolute deviation
by the number of data items; this will give an average absolute
deviation from the mean.
Average Absolute Deviation =
 X  X
N
This average absolute deviation gives the average distance of any
data item from the mean and thus is a good measure of spread.

Example 2

Given the following data: 5, 7, 11, 12, 13, 18. What is the average absolute deviation from the mean?

Standard Deviation  If you were to calculate the average absolute deviation of a distribution using
Standard Deviation
If you were to calculate the average absolute deviation of a distribution
using a value other than the mean, you could possibly get a smaller
average absolute deviation.
This result is one of the reasons that the average absolute deviation is
not the best measure of variability.
Instead, calculate the average of the squared differences from the
mean; this is the variance of a distribution.
If you were to calculate the average of the squared differences of a
distribution by using a value other than the mean, you would always
get a larger value.
The mean is the one number that minimizes the average of the squared
differences in a distribution.
Variance =

Example 3

Given the following data: 5, 7, 11, 12, 13, 18. What is the variance

 There are still two slight inconveniences in using variance as our measure of variability. 
There are still two slight inconveniences in using variance as our
measure of variability.
First, variance does not give an estimate of the distance of a typical
data from the mean; it is too big.
Second, if the data items have a unit of measurement associated with
them, then the variance would not have the same unit of
measurement; it would have square units.
By taking the square root of variance, we get standard deviation, which
is the measure of variability that we want.
Standard Deviation =
The two commonly used indicators of variability are the variance and the standard deviation. •Higher values
The two commonly used indicators of variability are the
variance and the standard deviation.
•Higher values for both of these indicators indicate a
larger amount of variability than do lower numbers.
• Zero stands for no variability at all (e.g., for the data 3,
3, 3, 3, 3, 3, the variance and standard deviation will
equal zero).
•When you have no variability, the numbers are a
constant (i.e., the same number).
•The variance tells you (exactly) the average deviation from the mean, in "squared units." •The standard
•The variance tells you (exactly) the average
deviation from the mean, in "squared units."
•The standard deviation is just the square root of the
variance (i.e., it brings the "squared units" back to
regular units).
•The standard deviation tells you (approximately) how
far the numbers tend to vary from the mean. (If the
standard deviation is 7, then the numbers tend to be
about 7 units from the mean. If the standard deviation
is 1500, then the numbers tend to be about 1500
units from the mean.)

If data are normally distributed, then an easy rule to apply to the

data is what we call “the 68, 95, 99.7 percent rule." That is

.

Approximately 68% of the cases will fall within one standard deviation of the mean.

Approximately 95% of the cases will fall within two standard deviations of the mean.

Approximately 99.7% of the cases will fall within three standard deviations of the mean.

Measures of Relative Deviation

  • When the deviation of observations within a series is to be measured, the standard deviation is the best measure.

  • But the size of the standard deviation depends upon the size of the mean as well as the unit of measurement of observation.

  • Hence to compare the variations of two or more variables which are in different units as well as with marked differences in the size of the means, comparison with standard deviation is not suitable.

  • As an example, the variation of the haemoglobin level of a group of students and variation of their body weights will have different means and they are measured in different units.

  • Haemoglobin level is expressed as gm % while the body weight will be in kilograms, further, the size of the mean haemoglobin level will be smaller while the mean body weight will be a big number and the size of the standard deviations will also be different.

  • In order to compare the deviations of such variables of data, the standard deviation is expressed as a percentage to the mean value and this quantity is known as Coefficient of Variation.

  • This has no unit but it is expressed as a percentage.

  • Coefficient of Variation = (Standard Deviation/Mean x 100)

Example 5

  • The mean and standard deviation of the haemoglobin level of a group is 12.6 gm % and 1.5 gm% respectively while the mean and standard deviation of the body weight of the same group is 50 kg and 2.2 kg respectively.

  • To compare the deviations of these two sets of observations coefficient of variation is calculated for each of the data.

 From these values it can be seen that the variation is greater for haemoglobin level
From these values it can be seen that the variation is greater
for haemoglobin level than for body weight of the group
although the absolute value of standard deviation was higher
for body weight.
Identifying outliers
Identifying outliers

Example

  • Check the following data set for outliers.

  • 5, 6, 12, 13, 15, 18, 22, 50

Chapter 7 examples
Chapter 7
examples
Normal distribution
Normal distribution

Example

  • Find the area under standard normal distribution for each of the following as percentage.

  • Between Z = 0 and Z = 1.5

  • Between Z=0 and Z=-2

  • To the right of

Z = 2

  • To the left of Z = 2

  • Between Z= 1.5 and 2.5

  • Between Z= - 1.5 and -2.5

  • Between Z = 1.5 and 1.5

Z score

  • Using the data presented in Table, find the percentage of students whose scores range from the mean (70.07) to 85, the SD is 10.27

  • (1) Convert 85 to a Z score:

Z = (85-70.07)/10.27 = 1.45

2) Look up the Z score (1.45) in Column A, finding the proportion (.4265)

 (3) Convert the proportion (.4265) to a percentage (42.65%); this is the percentage of students
(3) Convert the proportion (.4265) to a percentage (42.65%); this is the
percentage of students scoring between the mean and 85 in the course.
Finding the Area Between the Mean and a Negative Z Score  Using the data presented
Finding the Area Between the Mean and a
Negative Z Score
Using the data presented in Table 10.1, find the percentage of
students scoring between 65 and the mean (70.07)
(1) Convert 65 to a Z score:
Z = (65-70.07)/10.27=-.49
(2) Since the curve is symmetrical and negative area does
not exist, use .49 to find the area in the standard normal
table:
 3) Convert the proportion (.1879) to a percentage (18.79%); this is the percentage of students
3) Convert the proportion (.1879) to a percentage (18.79%); this is the
percentage of students scoring between 65 and the mean (70.07)

Example 1

  • One student has their exam result in mathematics and a second student has their exam result in English.

  • The second student has a higher mark than the first student; however, given that the exam marks for English and mathematics have different distributions, it is not possible to say that the second student has gained a higher achievement.

  • In order to make a judgment as to whether the second student has done better than the first, we need to judge their mark according to the mean and standard deviation of each set of marks.

  • For each value, in this case a student’s exam mark, a Z score

converts how far each exam mark is from the mean exam mark in

units of standard deviation.

 The formula for calculating Z scores is: x  x z  i i 
The formula for calculating Z scores is:
x  x
z 
i
i
Where
s
z = individual Z score
xi = individual observed value, for example, exam mark
= mean for the set of data
s = standard deviation.
A positive Z score means that the observed data is above the
mean. A negative Z score means that the observed data is
below the mean.
  • Student One:

  • Mathematics exam mark of 60%. Mean 50%. Standard deviation = 5.6

  • Student Two:

  • English exam mark of 70%. Mean 66%. Standard deviation =

10.5

 Student One’s mathematics exam mark converted into a Z score:  Student Two’s English exam
Student One’s mathematics exam mark converted into a Z
score:
Student Two’s English exam mark converted into a Z score:
  • Both students have a positive Z score, which means that they both did above average in their respective exams. Student One has a higher Z score than Student Two. Although Student Two gained the higher exam mark, Student One actually did better in relation to the other students sitting the exam in mathematics.

Example 2  The average pregnancy lasts 266 days, w/ a standard deviation of 16 days
Example 2
The average pregnancy lasts 266 days, w/ a standard
deviation of 16 days
Laura gave birth after 273 days
Let’s convert this to a Z-score:
z 
i
i
x  x
s
X = 273
mean = 266
SD = 16
273  266
7
= +0.4375
z 
16
16
Laura’s pregnancy was longer than average, which resulted in
a POSTIVE Z-score

=

Example

  • Converting a Z-score to a “raw” score:

  • The length of Ellen’s pregnancy results in a Z-score of 1.25

  • How many days was she pregnant?

    • X z

  • Z = -1.25

= 266

= 16

  • X = 266 + (-1.25)(16)

  • X = 246 days

  • Ellen’s pregnancy was shorter than average. This was expected as her Z-score was NEGATIVE

Example 3

  • The average time it takes for a certain pain reliever to begin to reduce symptoms is 30 minutes with standard deviation of 4 minutes. Assume the variable is normally Distributed. If 40 patients are randomly selected, approximately how many will be reduced pain to in less than 25 minutes?

Solutions  Let’s convert this to a Z-score: x  x z  i i s
Solutions
Let’s convert this to a Z-score:
x
x
z 
i
i
s
= 25 -30
= -1.25
4
Look the table: the value in the table between Z = 0 and Z =
-1.25 is 0.3944
Therefore the apian relief reduced the apian for 0.3944 X 40
= 15.7 approximately 16 patients
Determining the normality  A normally shaped or bell-shaped distribution is only one of many shapes
Determining the normality
A normally shaped or bell-shaped distribution is only one of many
shapes that a distribution can assume; however, it is very important
since many statistical methods require that the distribution of values
(shown in subsequent chapters) be normally or approximately
normally shaped.
There are several ways statisticians check for normality. The easiest
way is to draw a histogram for the data and check its shape.
If the histogram is not approximately bell shaped, then the data are not
normally distributed.
Skewness can be checked by using the Pearson co efficient of skewness
(PC) also called Pearson’s index of skewness. The formula is
  • If the index is greater than or equal to 1 or less than or equal to -1, it can be concluded that the data are significantly skewed.

  • In addition, the data should be checked for outliers by using the formula for detecting the outliers.

  • A survey of 18 high-technology rms showed the number of days’ inventory they had on hand. Determine if the data are approximately normally distributed.

  • 5 29 34 44 45 63 68 74 74 81 88 91 97 98 113 118 151 158

 4  Since the histogram is approximately bell-shaped, we can say that the distribution is
4
Since the histogram is approximately bell-shaped, we can say that the distribution is
approximately normal.
Chapter 8 Examples
Chapter 8
Examples
Example 1
Example 1
Example 2
Example 2
Z test
Z test
Steps for solving hypothesis testing using z test
Steps for solving hypothesis testing
using z test
Example one
Example one

t Test for a Mean

  • When the population standard deviation is unknown, the z test is not normally used for testing hypotheses involving means. A different test, called the t test, is used.

  • The distribution of the variable should be approximately normal.

  • the t distribution is similar to the standard normal distribution in the following ways.

  • 1. It is bell-shaped.

  • 2. It is symmetric about the mean.

  • 3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.

  • 4. The curve never touches the x axis.

  • The t distribution differs from the standard normal distribution in the following ways.

  • 1. The variance is greater than 1.

  • 2. The t distribution is a family of curves based on the degrees of freedom, which is a number related to sample size.

  • 3. As the sample size increases, the t distribution approaches the normal distribution.

Example 1
Example 1
Example 3
Example 3