Vous êtes sur la page 1sur 48

CHAPTER 1

Introduction to Statistics
and Data Analysis

PROF. GEVELYN B. ITAO


Master of Engineering (Material Science)

Probability and Statistics

Course Requirements
1. First Long Exam 25%
2. Second Long Exam 25%
3. Third Long Exam 25%
4. Case Study (Oral and Written)
5. 15%
Quizzes/Seatwork 10%
6. Assignment 5%
Total 100%
Passing 60%

Probability and Statistics


Statistics
A branch of mathematics that deals
with the collection, organization and
analysis of numerical data and with
such problems as experiment design
decision
making.
3 and
Important
features
of
Statistics:
1. Data gathering
2. Data analysis
3. Making decision

Probability and Statistics


Definition of terms
1. Raw data
Data collected in original form
2. Variable
Characteristic or attribute that can
assume different values
3. Population
All subjects possessing a common
characteristic that is being studied

Probability and Statistics


Definition of terms
4. Sample
A subgroup or subset of a
population
5. Parameter
Characteristic or measure obtained
from a population
6. Qualitative variables
Variables which assume nonnumerical values

Probability and Statistics

Definition of terms

7. Quantitative variables
variables which assume numerical
values
8. Discrete variables
Variables which assume finite or
countable number of possible
values, usually obtained by
counting variables
9. Continuous
Variables which assume infinite
number of possible values, usually
obtained by measurement

Probability and Statistics


Everyone involved in the experiment
must have a clear idea about what is
to be studied, how the data is to be
collected and at least a qualitative
understanding as to how these data
are to be for
analyzed.
Guidelines
designing experiments:
1. Statement of the problem /
recognition of the problem
Develop all the ideas about the
objectives of the experiment

Probability and Statistics


Guidelines for designing experiments:
2. Choice of factors and levels
Choose the factors to be varied in
the experiment
Choose the ranges over which
these factors will be varied
Identify the specific levels at which
runs will be made

Probability and Statistics


Guidelines for designing experiments:
3. Selection of the response variable
The experimenter should be certain
that this variable really provides
useful information about the process
underof
study
4. Choice
experimental design
Involves the consideration of sample
size (number of replicates/trials), the
selection of a suitable run order for
the experimental trials, and the
determination of whether or not
blocking or other randomization

Probability and Statistics


Guidelines for designing experiments:
5. Performing the experiment
Monitor the process carefully to
ensure that everything is being done
according to plan

6. Data analysis
Analyzing the data collected
during the experiment by
statistical methods
7. Conclusions
Making decision based on the
statistical results

Probability and Statistics


Methods of Sampling
1. Random sampling

sampling in which the data is


collected using chance methods or
random numbers.
2. Systematic
sampling

Sampling in which the data is


collected by selecting every kth
object sampling
3. Stratified
Sampling in which the population
is divided into groups (strata)
according to some characteristic.
Each strata is then sampled either

Probability and Statistics


Methods of Sampling
4. Cluster sampling
sampling
in
which
the
population
is
divided
into
groups (usually geographically).
Some of these groups are
randomly selected, and then all
of the elements in those groups
are selected.

Probability and Statistics


Methods of Summarizing/Characterizing
Data
1. Tabular Methods
a. Frequency Distribution
b. Cumulative Frequency
c. Stem and Leaf Table
2. Graphical Methods
a. Frequency Histogram
b. Frequency Polygon
c. Ogive
d. Pie chart

Probability and Statistics


Methods of Summarizing/Characterizing
Data
3. Numerical Methods
a. Measures of Central Tendencies
Mean/Average, Median, Mode
b. Measures of Dispersion
Range, Variance, Standard
Deviation
c. Measures
of Shape
Skewness, Kurtosis
d. Measures of Data Locations
Percentiles, Deciles, Quartiles

Probability and Statistics


Tabular Methods
1. Frequency Distribution
The organization of raw data in
tabular form with classes and
Stepsfrequencies
in Constructing a Frequency
Distribution Table:
1. Determine the number of class
intervals, k, needed to summarize the

data:

No. of class
intervals

No. of
samples

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency
Distribution Table:

2. Find the range of observations

Range

Minimum value
Maximum
value

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency
Distribution Table:

3. Determine the width of the class


intervals
Range
No. of class
intervals
Class
width

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

ass interval
Separates one class in a grouped
frequency from the other
The interval could actually appear in
the raw data and it begins with the
lowest value

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

ass boundary
Separates one class in a grouped
frequency from the other
It has one more decimal place than
the raw data and therefore it does
not appear in the data

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

ass boundary

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

ss Mark (Midpoint), xi
The number in the middle of the
class

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

requency, fi
The number of times a certain value
or class of values occurs

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

4. Form the frequency table


Class
Interval

Class
Boundaries

Class Mark,
xi

Frequency,
fi

Relative
Freqy.
%

lative Frequency, %
Frequency divided by the total number
of data

This gives the percent of values


falling in that class

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

Illustration: the nicotine contents, in


milligrams, for 40 cigarettes of a
certain brand were recorded as follows:
1.091.92 2.31 1.79 2.28
1.74 1.47 1.97 0.85 1.24
1.58 2.03 1.70 2.17 2.55
2.11 1.86 1.90 1.68 1.51
1.64 0.72 1.69 1.85 1.82
1.79 2.46 1.88 2.08 1.67
1.37 1.93 1.40 1.64 2.09
1.75 1.63 2.37 1.75 1.69

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

Illustration: the nicotine contents, in


milligrams, for 40 cigarettes of a
certain brand were recorded as follows:
1.091.92 2.31 1.79 2.28
1.74 1.47 1.97 0.85 1.24
1.58 2.03 1.70 2.17 2.55
2.11 1.86 1.90 1.68 1.51
1.64 0.72 1.69 1.85 1.82
1.79 2.46 1.88 2.08 1.67
1.37 1.93 1.40 1.64 2.09
1.75 1.63 2.37 1.75 1.69

Class Interval
0.72 1.02
1.03 1.33
1.34 1.64
1.65 1.95
1.96 2.26
2.27 2.57

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

Illustration: the nicotine contents, in


milligrams, for 40 cigarettes of a
certain brand were recorded as follows:
Class Interval

Class Boundaries

Class
Mark,
xi

0.72 1.02
1.03 1.33
1.34 1.64
1.65 1.95
1.96 2.26
2.27 2.57

0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575

0.87
1.18
1.49
1.80
2.11
2.42

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

Illustration: the nicotine contents, in


milligrams, for 40 cigarettes of a
certain brand were recorded as follows:
1.09

.28
1.74
1.58
2.11
1.64
1.79
1.37
1.75

1.47
2.03
1.86
0.72
2.46
1.93
1.63

1.92 2.31 1.79


1.97
1.70
1.90
1.69
1.88
1.40
2.37

0.85
2.17
1.68
1.85
2.08
1.64
1.75

1.24
2.55
1.51
1.82
1.67
2.09
1.69

Class
Boundaries

Frequency,
fi

0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575

2
2
8
17
6
5

Probability and Statistics


Tabular Methods
Steps in Constructing a Frequency Distribution
Table:

Illustration: the nicotine contents, in


milligrams, for 40 cigarettes of a
certain brand were recorded as follows:
Class Interval

Class
Boundaries

Class
Mark,
xi

Frequency,
fi

Relative
Freqy.
%

0.72 1.02
1.03 1.33
1.34 1.64
1.65 1.95
1.96 2.26
2.27 2.57

0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575

0.87
1.18
1.49
1.80
2.11
2.42

2
2
8
17
6
5

5.00
5.00
20.00
42.50
15.00
12.50

Probability and Statistics


Tabular Methods
Cumulative Frequency Distribution Table:

Cumulative Frequency, cfi

Gives the running total of the


frequencies
The number of observations in the
sample whose values are less than or
equal to the upper boundary of the class
interval

Relative Cumulative Frequency

(cfi / total number of samples) * 100


Percent of the values which are less
than the upper boundary

Probability and Statistics


Tabular Methods
Cumulative Frequency Distribution Table:
Class
Interval

Class
Boundaries

0.72 1.02
1.03 1.33
1.34 1.64
1.65 1.95
1.96 2.26
2.27 2.57

0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575

Class Freqy, Cumulative


Mark,
fi
Frequency,
xi
cfi
0.87
1.18
1.49
1.80
2.11
2.42

2
2
8
17
6
5

2
4
12
29
35
40

Relative
Cum.
Freqy.
%
5.00
10.00
30.00
72.50
87.50
100.00

Probability and Statistics


Graphical Methods
Frequency Histogram

A graph which displays the data by


using vertical bars of various heights
to represent frequencies
The horizontal axis can either be
class intervals, class boundaries, or
class marks

Probability and Statistics


Graphical Methods
Frequency Histogram
18

frequency

16
14
12
10
8
6
4
2
0
0.87000000000000077

1.49

Class mark

2.11

Probability and Statistics


Graphical Methods
Frequency Polygon

A line graph between frequency and


class mark
18

frequency

16
14
12
10
8
6
4
2
0
0.87000000000000066

1.49

Class mark

2.11

Probability and Statistics


Graphical Methods
Ogive (O-jive)
Relative cumulative frequency

A frequency polygon of relative


cumulative frequency against upper
class boundaries
120
100
80
60
40
20
0

Upper class boundary

Probability and Statistics


Graphical Methods
Pie chart

The degree of slice is based on the


relative frequency

5
5
20
42.5
15
12.5

Probability and Statistics


Numerical Methods
Measures of Central Tendencies

1. Mean / Average
The sum of the product of class
mark and the corresponding
frequency divided by the total
number of samples

Probability and Statistics


Numerical Methods
Measures of Central Tendencies

2. Median
The value that will divide the
samples into two equal halves
when the samples are arranged
from lowest to highest
Total frequencies

Lower class boundary of


the median class

of all class
intervals before
the median class
Frequency of the
median class

Probability and Statistics


Numerical Methods
Measures of Central Tendencies

3. Mode
The most frequent number

Lower class
boundary of the
modal class
Frequency difference of the
modal class and the
preceding class

Frequency difference
of the modal class
and the succeeding
class

Probability and Statistics


Numerical Methods
Measures of Variability / Dispersion

1. Range
Measures how the samples are
clustered.
It is the difference between the
highest and the lowest values of
the raw data

Range

Minimum value
Maximum
value

Probability and Statistics


Numerical Methods
Measures of Variability / Dispersion

2. Variance
Measures how the samples are
dispersed.

Probability and Statistics


Numerical Methods
Measures of Variability / Dispersion

3. Standard deviation, s
The positive square root of the
variance
Coefficient of variation, C
v

If Cv < 10 the data are


considered clustered, else the
data are dispersed

Probability and Statistics


Numerical Methods
Measures of Shape

1. Skewness
A measure of the symmetry of the
distribution of the sample

If Sk < 0 the distribution is skewed


to the left (i.e., left tail is longer than
right tail)

Probability and Statistics


Numerical Methods
Measures of Shape

1. Skewness
A measure of the symmetry of the
distribution of the sample

If Sk = 0 the distribution is
symmetric with respect to the mean,
i.e., right and left tails are of equal
length (the distribution is called
normal or Gaussian)

Probability and Statistics


Numerical Methods
Measures of Shape
1. Skewness
A measure of the symmetry of the
distribution of the sample

If Sk > 0 the distribution is skewed


to the right (i.e., right tail is longer
than left tail)

Probability and Statistics


Numerical Methods
Measures of Shape

2. Kurtosis
A measure of the height of the
distribution

If kurtosis < 0 - the distribution has


short height or is almost flat

Probability and Statistics


Numerical Methods
Measures of Shape

2. Kurtosis
A measure of the height of the
distribution

If kurtosis = 0 the distribution has


the right height

Probability and Statistics


Numerical Methods
Measures of Shape
2. Kurtosis
A measure of the height of the
distribution

If kurtosis > 0 the distribution has a


high peak

Probability and Statistics


Numerical Methods
Measures of Data Location

1. Quartiles: Q1, Q2, Q3


It is the 25%, 50% and 75%
respectively of the data

2. Deciles: D1, D2, D3, ,D9


It is the 10%, 20%, 30%,90%
respectively of the data

3. Percentile: P1, P2, P3, ,P99


It is the 1%, 2%, 3%,99%
respectively of the data

Vous aimerez peut-être aussi