Vous êtes sur la page 1sur 14

Statistics for Engineers: Chapter 2

Chapter 2

Instructor: Robel Metiku


Frequency Distribution

Frequency Distribution is the organization of raw data (data in original form) in table form,
using classes and frequencies.
For example suppose a researcher wished to do a study on the number of kms that the
employees of a factory traveled to work each day. The researcher first would have to collect the
data by asking each employee the approximate distance the factory is from his/her home. When
data are collected in original form, they are called raw data. In this case, the raw data are:

The researcher organizes the data by constructing a frequency distribution; the frequency is
the number of values in a specific class of distribution. For this set of data a frequency
distribution is shown as:

Notes
1. This frequency distribution has 6 classes.
2. For the first class, the class lower limit is 1 and the class upper limit is 3.
3. The class width is 3 (the class width for a class in a frequency distribution is founded by
subtracting the lower (or upper) class limit of one class from the lower (or upper) class
limit of the next class).
Class boundaries: the class boundaries are used to separate the classes so that there are no
gaps in the frequency distribution. The gaps are due to the limits; for example, there is a gap
between 3 and 4.
The basic rule of the class boundaries is that the class limits should have the same decimal
place value as the original data, but the class boundaries should have one additional place
value and end in a 5. For example, if the values in the data set are whole numbers, such as 34,
32, 36, the limit of the class might be 31 37 , and the boundaries are 30.5 37.5.

ATTC, Manufacturing Technology Dept. Page 1

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

For the above example:

To construct a frequency distribution, follow these rules:


1. There should be between 5 and 20 classes.
2. The class width should be an odd number. This ensures that the class midpoint of each
class has the same place value as the data class midpoint = (lower limit + upper limit)/2.
This rule is only a suggestion, and is not rigorously followed.
3. The classes must be mutually exclusive. Mutually exclusive classes have nonoverlapping class limits so that data cannot be placed into two classes.
4. The classes must be continuous. Even if there are no values in a class, the class must
be included in the frequency distribution.
5. There should be enough classes to accommodate all the data.
6. The classes must be equal in width.
Example 1
The data represent the record of high temperatures for 50 states. Construct a grouped
frequency distribution for the data using 7 classes.

Step 1: Determine the classes:


The highest value is H = 134, the lowest value L= 100
The range R = 134 100 = 34
Class width= ( R )/number of classes = 34/7 = 4.9
Round up to the nearest whole number then class width = 5
So the first class will be 100104, the second class will be 105109 and so on
The first class boundaries will be 99.5 104.5 and so on.
Step 2: Tally the data.

ATTC, Manufacturing Technology Dept. Page 2

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Step 3: Find the numerical frequencies from the tallies.


Step 4: Find the cumulative frequencies:
A cumulative frequency column can be added to the distribution by adding the frequency
in each class to the total of the frequencies of the classes preceding that class. The
completed frequency distribution is:

Example 2
The average quantitative entrance examination scores for the top 30 graduate schools of
engineering are listed below. Construct a frequency distribution with six classes.

Solution
We follow the same steps as in example 1. Lowest value= 746 and highest value = 780, then
the Range is given by R = 780 746 = 34, hence the class width = 34/6 = 5.666 rounded up to
6. The frequency distribution is thus given by:

The Histogram: is a graph that displays the data using vertical bars of various heights to
represent the frequencies of the classes.
Consider Example 1, which has the following frequency distribution for the record of high
temperature for each of the 50 states.

ATTC, Manufacturing Technology Dept. Page 3

Statistics for Engineers: Chapter 2


Class Limits
100 104
105 109
110 114
115 119
120 124
125 129
130 134

Instructor: Robel Metiku

Class Boundaries
99.5 104.5
104.5 109.5
109.5 114.5
114.5 119.5
119.5 124.5
124.5 129.5
129.5 134.5

Frequency
2
8
18
13
7
1
1

Class midpoint
102
107
112
117
122
127
132

To construct a Histogram, we follow the following steps:


Step 1: Draw and label the x and y axes. The x axis is always a horizontal axis, and the y axis is
always a vertical axis.
Step 2: Represent the frequency on the y axis and the class boundary midpoints on the x axis.
Step 3: Using the frequencies as heights, draw vertical bars for each Class.
- - - Draw the frequency histogram - - Example 3
Consider the data below which specifies the "life" of 40 similar car batteries recorded to the
nearest tenth of a year. The batteries are guaranteed to last 3 years.
Car battery life

Summarize the data through the use of frequency distribution where the data are grouped into
different classes or intervals. Dividing each class frequency by the total number of observations,
we obtain the proportion of the set of observations in each of the classes. A table listing relative
frequencies is called a relative frequency distribution. The relative frequency distribution for the
data of the above table, showing the midpoints of each class interval, is given in the table below.

Relative frequency distribution of battery life

ATTC, Manufacturing Technology Dept. Page 4

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

The information provided by a relative frequency distribution in tabular form is easier to grasp if
presented graphically. Using the midpoints of each interval and the corresponding relative
frequencies, we construct a relative frequency histogram as shown in figure 1 below.

Fig. 1 Relative frequency histogram


Many continuous frequency distributions can be represented graphically by the characteristic
bell-shaped curve of fig. 2. Graphical tools such as what we see in fig. 1 and fig. 2 aid in the
characterization of the nature of the population.

Fig. 2 Estimating frequency distribution

ATTC, Manufacturing Technology Dept. Page 5

Fig. 3 Skewness of data

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

A distribution is said to be symmetric if it can be folded along a vertical axis so that, the two
sides coincide. A distribution that lacks symmetry with respect to a vertical axis is said to be
skewed. The distribution illustrated in Figure 3(a) is said to be skewed to the right since it has a
long right tail and a much shorter left tail. In Figure 3(b) we see that the distribution is
symmetric, while in Figure 3(c) it is skewed to the left.
If our primary purpose in looking at the data is to determine the general shape or form of the
distribution, it will seldom be necessary to construct a relative frequency histogram. There are
several other types of graphical tools and plots that are used. These are discussed in Chapter 3.
Thus, we have shown how one can gain information from raw data by organizing them into a
frequency distribution and then presenting the data by using graphs. In chapter 4, we are going
to study the statistical methods that can be used to summarize data. The most familiar of these
methods is the finding of averages.
Discrete and Continuous Data
Frequency Graphs of Discrete Data
Consider the number of defective items in successive samples of six items each. The data are
summarized in the table below.
Number of defectives, xi
0
1
2
>2

Frequency, fi
48
10
2
0

These data can be shown graphically in a very simple form because they involve discrete data,
as opposed to continuous data, and only a few different values exist. The variable is discrete in
the sense that only certain values are possible. in this case the number of defective items in a
group of six must be an integer rather than a fraction. The number of defective items in each
group of this example is only 0, 1, or 2. The frequencies of these numbers are shown above.
The corresponding frequency graph is shown in fig. 4 below. The isolated spikes correspond to
the discrete character of the variate.

ATTC, Manufacturing Technology Dept. Page 6

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Fig. 4 Distribution of Numbers of Defectives in Groups of Six Items


If the number of different values is very large, it may be desirable to use the grouped frequency
approach, as discussed below for continuous data.
Continuous Data: Grouped Frequency
If the variate is continuous, any value at all in an appropriate range is possible. Between any
two possible values, there are an infinite number of other possible values, although measuring
devices are not able to distinguish some of them from one another. Measurements will be
recorded to only a certain number of significant figures. Even to this number of figures, there will
usually be a large number of possible values. If the number of possible values of the variate is
large, too many occur on a table or graph for easy comprehension. We can make the data
easier to comprehend by dividing the variate into intervals or classes and counting the
frequency of occurrence for each class. This is called the grouped frequency approach.
Thus, frequency grouping is used to make the distribution more easily understood. The width of
each class (the difference between its lower boundary and its upper boundary) should be
constant from one class to another. The number of classes should be from five to twenty,
depending chiefly on the size of the population or sample being represented. If the number of
classes is too large, the result is too detailed and it is hard to see an underlying pattern. If the
number of classes is too small, there is appreciable loss of information, and the pattern may be
obscured.

ATTC, Manufacturing Technology Dept. Page 7

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

An empirical relation which gives an approximate value of the appropriate number of classes is
Sturgess Rule:
number of class intervals = 1 + 3.3 log10N . (2.1)
where N is the total number of observations in the sample or population.
The procedure is to start with the range, the difference between the largest and the smallest
items in the set of observations. Then the constant class width is given approximately by
dividing the range by the approximate number of class intervals from equation 2.1. Round off
the class width to a convenient number.
The class boundaries must be clear with no gaps and no overlaps. For problems in this course
choose the class boundaries halfway between possible magnitudes. This gives a definite and
fair boundary. For example, if the observations are recorded to one decimal place, the
boundaries should end in five in the second decimal place. If 2.4 and 2.5 are possible
observations, a class boundary might be chosen as 2.45. The smallest class boundary should
be chosen at a convenient value a little smaller than the smallest item in the set of observations.
Each class midpoint is halfway between the corresponding class boundaries.
Then the number of items in each class should be tallied and shown as class frequency in a
table called a grouped frequency table. The relative frequency is the class frequency divided by
the total of all the class frequencies, which should agree with the total number of items in the set
of observations. The cumulative frequency is the total of all class frequencies smaller than a
class boundary. The class boundary rather than class midpoint must be used for finding
cumulative frequency because we can see from the table how many items are smaller than a
class boundary, but we cannot know how many items are smaller than a class midpoint unless
we go back to the original data. The relative cumulative frequency is the fraction (or percentage)
of the total number of items smaller than the corresponding upper class boundary.
Example 4
The thickness of a particular metal part of an optical instrument was measured on 121
successive items as they came off a production line under what was believed to be normal
conditions. The results are shown in the table below.
Thickness is a continuous variable, since any number at all in the appropriate range is a
possible value. The data in the above table are given to two decimal places, but it would be
possible to measure to greater or lesser precision. The number of possible results is infinite. The
mass of numbers is very difficult to comprehend.

ATTC, Manufacturing Technology Dept. Page 8

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Thickness of metal parts, mm

Now let us apply the grouped frequency approach to the numbers. The largest item in the table
is 3.57, and the smallest is 3.21, so the range is 0.36.
The number of class intervals according to Sturges Rule should be approximately 1 + (3.3)
(log10121) = 7.87. Then the class width should be approximately 0.36 / 7.87 = 0.0457. Let us
choose a convenient class width of 0.05. The thicknesses are stated to two decimal places, so
the class boundaries should end in five in the third decimal.
Let us choose the smallest class boundary, then, as 3.195. The resulting grouped frequency
table is shown below.
Grouped Frequency Table for Thicknesses

In this table the class frequency is obtained by counting the tally marks for each class. This
becomes easier if we divide the tally marks into groups of five as shown in the table. The
relative frequency is simply the class frequency divided by the total number of items in the table,
i.e. the total frequency, which is 121 in this case. The cumulative frequency is obtained by

ATTC, Manufacturing Technology Dept. Page 9

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

adding together all the class frequencies for classes with values smaller than the current upper
class boundary. Thus, in the third line of the table, the cumulative frequency of 40 is the sum of
the class frequencies 2, 14 and 24. The corresponding relative cumulative frequency would be
40/121 = 0.331, or 33.1%. The cumulative frequency in the last line must be equal to the total
frequency. From the table the mode is given by the class midpoint of the class with the largest
class frequency, 3.370 mm. The mean, median and mode, 3.369, 3.37 and 3.370 mm, are in
close agreement. This indicates that the distribution is approximately symmetrical.
Graphical representations of grouped frequency distributions are usually more readily
understood than the corresponding tables. Some of the main characteristics of the data can be
seen in histograms and cumulative frequency diagrams. A histogram is a bar graph in which the
class frequency or relative class frequency is plotted against values of the quantity being
studied, so the height of the bar indicates the class frequency or relative class frequency. Class
midpoints are plotted along the horizontal axis.
In principle, a histogram for continuous data should have the bars touching one another.
However, the bars are often shown separated, and some computer software does not allow the
bars to touch one another.
The histogram for the data is shown in Figure 5 for a class width of 0.05 mm as already
calculated. Relative class frequency is shown on the right-hand scale.

Fig. 5 Histogram for Class Width of 0.05 mm

ATTC, Manufacturing Technology Dept. Page 10

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Histograms for class widths of 0.03 mm and 0.10 mm are shown in Figures 6 and 7 for
comparison.

Fig. 6: Histogram for Class Width of 0.03 mm

Fig. 7: Histogram for Class Width of 0.10 mm

Of these three, the class width of 0.05 mm in Figure 5 seems most satisfactory (in agreement
with Sturges Rule).
Cumulative frequencies are shown in the last column of the table. A cumulative frequency
diagram is a plot of cumulative frequency vs. the upper class boundary, with successive points
joined by straight lines. A cumulative frequency diagram for the thicknesses is shown in figure 8.

Figure 8: Cumulative Frequency Diagram for Thickness


The cumulative frequency diagram of Figure 8 could be changed into a relative cumulative
frequency diagram by a change of scale for the ordinate.

ATTC, Manufacturing Technology Dept. Page 11

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Example 5
A sample of 120 electrical components was tested by operating each component continuously
until it failed. The time to the nearest hour at which each component failed was recorded. The
results are shown in the table below.
Times to Failure of Electrical Components, hours

Once again, frequency grouping is needed to make sense of this mass of data. When the data
are sorted in order of increasing magnitude, the largest value is found to be 5312 hours and the
smallest is 3 hours. Then the range is 5312 3 = 5309 hours. There are 120 data points. Then
applying Sturges Rule, equation 2.1 indicates that the number of class intervals should be
approximately 1 + 3.3 log10120 = 7.86. Then the class width should be approximately
5309/7.86 = 675 hours. A more convenient class width is 600 hours.
Grouped Frequency Table for Failure Times

ATTC, Manufacturing Technology Dept. Page 12

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

Figure 9: Histogram of Times to Failure for Electrical Components


Since times to failure are stated to the nearest hour, each class boundary should be a number
ending in 0.5. The smallest class boundary must be somewhat less than the smallest value, 3.
Then a convenient choice of the smallest class boundary is 0.5 hours. The resulting grouped
frequency table is shown in the table below. The corresponding histogram is Figure 9, and the
cumulative frequency diagram (last column of the table vs. upper class boundary) is Figure 10.

Figure 10: Cumulative Frequency Diagram for Time to Failure


Figures 5 and 9 are both histograms for continuous data, but their shapes are quite different.
Figure 5 is approximately symmetrical, whereas Figure 9 is strongly skewed to the right (i.e., the
tail to the right is very long, whereas no tail to the left is evident in Figure 9). Correspondingly,

ATTC, Manufacturing Technology Dept. Page 13

Statistics for Engineers: Chapter 2

Instructor: Robel Metiku

the cumulative frequency diagram of Figure 8 is S-shaped, with its slope first increasing and
then decreasing, whereas the cumulative frequency diagram of Figure 10 shows the slope
generally decreasing over its full length.
Now the mean, median and mode for the data (corresponding to Figures 9 and 10) will be
calculated and compared. The mean is = 140746/120 = 1173 hours. The median is the average
of the two middle items in order of magnitude, 869 and 877, so 873 hours. The mode according
to the table is the midpoint of the class with the largest frequency, 300.5 hours, but of course the
value would vary a little if the class width or starting class boundary were changed. Since Fig. 9
shows that the distribution is very asymmetrical or skewed, it is not surprising that the mean,
median and mode are so widely different.

ATTC, Manufacturing Technology Dept. Page 14

Vous aimerez peut-être aussi