Vous êtes sur la page 1sur 90
Chapter two
Examples
Example one

# patients is shown below. Construct a frequency distribution for the data set, using six classes.

## 82

Time series graph
Examples
US labor force
date
1960
1970
1980
1990
2000
2008
women
11
10
8
9
11
15
men
34
28
18
15
17
22
Chapter 3
Examples
The mean
Mean of grouped data
The median
The mode in a grouped data
Mode = mean-3(mean-medium)
Chapter 4
Examples
Range
The range for a set of data items is the difference between the
largest and smallest
values.
Although the range is the easiest of the numerical measures
of variability to compute, it is not widely used because it is
based on only two of the items in the data set and thus is
influenced too much by extreme data values.
Range = max - min
Interquartile Range
A form of the range that avoids the dependence on extreme
values in the data set is the interquartile range (IQR), or Q-
This descriptive measure of variability is simply the
difference between the third quartile , or 75%-tile data item,
and the first quartile , or 25%-tile data item.
In effect, it is showing the range for the middle 50% of the
data and, as such, is not affected by the extreme values in the
data set

# distances would measure the total spread around the mean

If you were to include more data items, equally spread around
the mean, you would increase the total of the distances even
though the new distribution might be less variable.
Therefore, it is important to divide the total absolute deviation
by the number of data items; this will give an average absolute
deviation from the mean.
Average Absolute Deviation =
 X  X
N
This average absolute deviation gives the average distance of any
data item from the mean and thus is a good measure of spread.

# Given the following data: 5, 7, 11, 12, 13, 18. •What is the average absolute deviation from the mean?

Standard Deviation
If you were to calculate the average absolute deviation of a distribution
using a value other than the mean, you could possibly get a smaller
average absolute deviation.
This result is one of the reasons that the average absolute deviation is
not the best measure of variability.
Instead, calculate the average of the squared differences from the
mean; this is the variance of a distribution.
If you were to calculate the average of the squared differences of a
distribution by using a value other than the mean, you would always
get a larger value.
The mean is the one number that minimizes the average of the squared
differences in a distribution.
Variance =

# Given the following data: 5, 7, 11, 12, 13, 18. •What is the variance

There are still two slight inconveniences in using variance as our
measure of variability.
First, variance does not give an estimate of the distance of a typical
data from the mean; it is too big.
Second, if the data items have a unit of measurement associated with
them, then the variance would not have the same unit of
measurement; it would have square units.
By taking the square root of variance, we get standard deviation, which
is the measure of variability that we want.
Standard Deviation =
The two commonly used indicators of variability are the
variance and the standard deviation.
•Higher values for both of these indicators indicate a
larger amount of variability than do lower numbers.
• Zero stands for no variability at all (e.g., for the data 3,
3, 3, 3, 3, 3, the variance and standard deviation will
equal zero).
•When you have no variability, the numbers are a
constant (i.e., the same number).
•The variance tells you (exactly) the average
deviation from the mean, in "squared units."
•The standard deviation is just the square root of the
variance (i.e., it brings the "squared units" back to
regular units).
•The standard deviation tells you (approximately) how
far the numbers tend to vary from the mean. (If the
standard deviation is 7, then the numbers tend to be
about 7 units from the mean. If the standard deviation
is 1500, then the numbers tend to be about 1500
units from the mean.)

• #  To compare the deviations of these two sets of observations coefficient of variation is calculated for each of the data.

From these values it can be seen that the variation is greater
for haemoglobin level than for body weight of the group
although the absolute value of standard deviation was higher
for body weight.
Identifying outliers

• #  5, 6, 12, 13, 15, 18, 22, 50

Chapter 7
examples
Normal distribution

# 2) Look up the Z score (1.45) in Column A, finding the proportion (.4265)

(3) Convert the proportion (.4265) to a percentage (42.65%); this is the
percentage of students scoring between the mean and 85 in the course.
Finding the Area Between the Mean and a
Negative Z Score
Using the data presented in Table 10.1, find the percentage of
students scoring between 65 and the mean (70.07)
(1) Convert 65 to a Z score:
Z = (65-70.07)/10.27=-.49
(2) Since the curve is symmetrical and negative area does
not exist, use .49 to find the area in the standard normal
table:
3) Convert the proportion (.1879) to a percentage (18.79%); this is the
percentage of students scoring between 65 and the mean (70.07)

# units of standard deviation.

The formula for calculating Z scores is:
x  x
z 
i
i
Where
s
z = individual Z score
xi = individual observed value, for example, exam mark
= mean for the set of data
s = standard deviation.
A positive Z score means that the observed data is above the
mean. A negative Z score means that the observed data is
below the mean.

# 10.5

Student One’s mathematics exam mark converted into a Z
score:
Student Two’s English exam mark converted into a Z score:
• #  Both students have a positive Z score, which means that they both did above average in their respective exams. Student One has a higher Z score than Student Two. Although Student Two gained the higher exam mark, Student One actually did better in relation to the other students sitting the exam in mathematics.

Example 2
The average pregnancy lasts 266 days, w/ a standard
deviation of 16 days
Laura gave birth after 273 days
Let’s convert this to a Z-score:
z 
i
i
x  x
s
X = 273
mean = 266
SD = 16
273  266
7
= +0.4375
z 
16
16
Laura’s pregnancy was longer than average, which resulted in
a POSTIVE Z-score

=

• #  The average time it takes for a certain pain reliever to begin to reduce symptoms is 30 minutes with standard deviation of 4 minutes. Assume the variable is normally Distributed. If 40 patients are randomly selected, approximately how many will be reduced pain to in less than 25 minutes?

Solutions
Let’s convert this to a Z-score:
x
x
z 
i
i
s
= 25 -30
= -1.25
4
Look the table: the value in the table between Z = 0 and Z =
-1.25 is 0.3944
Therefore the apian relief reduced the apian for 0.3944 X 40
= 15.7 approximately 16 patients
Determining the normality
A normally shaped or bell-shaped distribution is only one of many
shapes that a distribution can assume; however, it is very important
since many statistical methods require that the distribution of values
(shown in subsequent chapters) be normally or approximately
normally shaped.
There are several ways statisticians check for normality. The easiest
way is to draw a histogram for the data and check its shape.
If the histogram is not approximately bell shaped, then the data are not
normally distributed.
Skewness can be checked by using the Pearson co efﬁcient of skewness
(PC) also called Pearson’s index of skewness. The formula is

• #  5 29 34 44 45 63 68 74 74 81 88 91 97 98 113 118 151 158

4
Since the histogram is approximately bell-shaped, we can say that the distribution is
approximately normal.
Chapter 8
Examples
Example 1
Example 2
Z test
Steps for solving hypothesis testing
using z test
Example one

Example 1
Example 3