Vous êtes sur la page 1sur 71

STATISTICAL METHODS

Nina Cortez
Jv Gepulgani IN QUALITY
MANAGEMENT
Kurt Manalac
Andrey Pagsolingan
Arvin Perez
Lheam Vilog
1
STATISTICS
• Is a science concerned with “the collection,
organization, analysis, interpretation, and
presentation of data”.
• It is essential for quality and for implementing
a continuous improvement philosophy.
• Statistical methods helps managers make
sense of data and gain insight about the
nature of variation in the processes they
manage.

2
PROBABILITY DISTRIBUTION
BASIC PROBABILITY CONCEPT
• To apply statistics, you need to have a basic
understanding of probability and probability
distribution.
• Statistical terminology, an experiment is a
process that result in some outcome. Outcome is
the result we observe. Sample space is the
collection of all possible outcomes of an
experiment.

4
PROBABILITY
• The likelihood that an outcome occurs.
• The probability associated with any outcome must be
between 0 and 1 or 0 < P(0i) < 1
• The sum of the probabilities over all possible outcomes must
be 1.0 or 1
• Event is a collection of one or more outcomes from a sample
space,

5
 Complement of A, denoted as A^c, consist of all outcomes in the
sample space not in A. for example, if A is the event of finding 2
or fewer defectives in a sample of 10, then A^c is the event of
finding 3 or more defectives.
 Two events are mutually exclusive if they have no outcomes in
common. For example, if A is the event “2 or fewer defects in a
sample” and B is the event “5 or more defects,” then clearly A
and B are mutually exclusive.

6
RULES APPLY TO CALCULATING THE PROBAILITIES OF EVENTS:

 Rule 1: the probability of any event is the sum of the probabilities of


the outcomes that compose that event.
 Rule 2: the probability of the complement of any event. A is P(A^c) = 1
- P(A).
 Rule 3: if events A and B are mutually exclusive, then P(A and B) = P(A)
+ P(B).
 Rule 4: if two events A and B are not mutually exclusive, then P(A and
B) = P (A) + P(B) – P(A and B)

7
EXAMPLE USING THE PROBABILITY RULES
In testing a new personal computer after assembly, a company discovered
that among a sample of 100 units, 3 failed to boot up properly because of a
defect in the motherboard, 4 units had a hard drive failure , and 2 units
experienced both failures. Let A be the event “failure to boot” and B be the
event “hard drive failure.” Then P(A) = 3/100 and P(B) = 4/100. However,
these events are not mutually exclusive because both A and B occurred
together; specifically, P(A and B) = 2/100. Therefore, the probability that one
or the other failure occurred is P(A and B) = P(A) + P(B) – P(A and B) = 3/100 +
4/100 - 2/100 = 5/100.

8
CONDITIONAL PROBABILITY
 Is the probability of occurrence of one event A, given that another
event B is known to be true or have already occurred. P(A|B) = P(A
and B) /P(B)
 For example, multiplying both side of formula (6.1) by P(B), we
obtain P(A and B) = P(A |B) P(B). note that we may switch the roles
of A and B and write P9B and A = P(B|A) P(A). But P(B and A) is the
same as P(A and B in two way: P(A and B) = P(A |B) P(B) = = P(B|A)
P(A) this is often called Multiplication rule of probability

9
PROBABILITY DISTRIBUTION
 Random variable is a numerical description of an experiment.
For example, an experiment consist or sampling 10 parts and
counting the number of defectives. We might define the random
variable X to be the number of defective parts in the sample. We
might define a random variable Y to be 1 if the outcomes is pass,
and 0 if the outcomes is fail. A random variable can be either
discrete or continuous, depending on the specific numerical
values it may assume.

10
Is a characterization of the possible values that a random variable
may assume along with the probability of assuming these values.
For a random variable X, the probability distribution of the of X is
denoted by a mathematical function f(x). the symbol Xi represents
the i^th value of the random variable or X and f(Xi) its probability.
The cumulative distribution function, F(X), specifies the probability
that the random variable X will assume a value less than or equal
to a specified value, x. this is also denoted as P(X < x), and read as
“ the probability that the random variable X is less than or equal to
x.”

11
BINOMIAL DISTRIBUTION
Describes the probability of obtaining exactly x "successes" in a
sequence of n identical experiments, called trials. A success can be any
one of two possible outcomes of each experiment. In some situations, it
might represent a defective item, in others, a good item. The
probability of success in each trial is a constant value p.

F(x) = (n/x)p^x(1-p)^n-x
= n! /x! (n - x)! P^x(1-p) ^n-x

12
USING THE BINOMIAL DISTRIBUTION
If the probability that a process produces a defective part is 0.2, then
the probability distribution that x parts out of a sample of 10 will be
defective is described using formula (6.3) with n = 10 and p = 0.2: thus
to find the probability that 3 parts among a sample of 10 will be
defective, we compute

F(3) = (10/3) (0.2)^3(0.8)^10-3


= (10!/3!7!) (0.008)(0.2097152)
= 120(0.008)(0.2097152) = 0.020133

13
POSSON DITRIBUTION
• named after French mathematician Siméon Denis Poisson.
• is a discrete tool that helps to predict the probability of certain
events from happening when you know how often the event has
occurred. It gives us the probability of a given number of events
happening in a fixed interval of time.

14
Calculating the Poisson Distribution
The Poisson Distribution pmf is:

Where:
•The symbol “!” is a factorial.
•e: A constant equal to approximately 2.71828 (Euler's Constant)
•μ: The mean number of successes that occur in a specified region.
•x: The actual number of successes that occur in a specified region.
•P(x;μ): The Poisson probability that exactly x successes occur in a Poisson
experiment, when the mean number of successes is μ.

15
Poisson Distribution Examples
1. The average number of homes sold by the Acme Realty company is
2 homes per day. What is the probability that exactly 3 homes will be
sold tomorrow?

Solution:
Given:
•μ = 2; since 2 homes are sold per day, on average.
•x = 3; since we want to find the likelihood that 3 homes will be sold
tomorrow.
e = 2.71828; since e is a constant equal to approximately 2.71828.

16
We plug these values into the Poisson formula as
follows:
𝑃(𝑥; 𝜇) = (𝑒 −2 ) (𝜇 𝑥 ) / 𝑥!
𝑃(3; 2) = (2.71828−2 ) (23 ) / 3!
𝑃(3; 2) = (0.13534) (8) / 6
𝑃(3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is
0.180.

17
1. If three persons, on an average, come to ABC company for job
interview, then find the probability that less than three people have
come for interview on a given day.

Solution:
Given:
•μ = 3
•x = P(x<3;3) = P(0;3) + P(1;3) + P(2;3)
•e = 2.71828

18
𝑃(0; 3) = (𝑒 −3 )(30 ) / 0! = 0.04978706837
𝑃(1; 3) = (𝑒 −3 )(31 ) / 1! = 0.1493612051
𝑃(2; 3) = (𝑒 −3 )(32 ) / 2! = 0.22404180766
Hence,
P(x<3;3) = P(0;3)+P(1;3)+P(2;3)
= 0.04978706837+0.1493612051+0.22404180766
= 0.42319008113
The probability of less than three persons coming for
interview on a certain day is 0.42319008113.

19
CONTINUOUS PROBABILITY DISTRIBUTION
- a probability distribution in which the random variable X can take on any
value (is continuous). Because there are infinite values that X could assume,
the probability of X taking on any one specific value is zero. Therefore we
often speak in ranges of values (p(X>0) = .50).
- Cumulative Distribution Function (CDF) is a function that gives the
probability that a random variable is less than or equal to the independent
variable of the function, F(x), represents the area under the density function
to the left of x, 𝑃(𝑋 ≤ 𝑥).
- The probability of X is between a and b is equal to the difference of the CDF
evaluated at these 2 points, that is:
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑋 ≤ 𝑏) − 𝑃(𝑋 ≤ 𝑎) = 𝐹(𝑏) − 𝐹(𝑎)

20
4 Types of Continuous
Probability Distribution

21
NORMAL DITRIBUTIONS
- is a probability function that describes how the values of a variable are
distributed; the probability density function is represented graphically
by the familiar bell-shaped curve.

Properties of a normal distribution


•The mean, mode and median are all equal.
•The curve is symmetric at the center (i.e. around the mean, μ).
•Exactly half of the values are to the left of center and exactly half the values
are to the right.
•The total area under the curve is 1.

22
The general formula for the probability density function of the
normal distribution is

𝑒 − (𝑥 − 𝜇)2 /(2𝜎 2 )
𝑓(𝑥) =
𝜎 2𝜋
Where:
● μ is the mean or average
● σ is the standard deviation

23
Standard Normal Distribution
The case where μ = 0 and σ = 1 is called the standard normal
distribution. The equation for the standard normal
distribution is
𝑒 − 𝑧 2 /2
𝑓(𝑥) =
2𝜋
The letter z is usually to represent this particular variable.
𝑥−𝜇
Z-score formula: 𝑧 =
𝜎

24
Standard Normal Distribution

25
Normal Distribution Examples
1. A manufacturer of MRI scanners used for medical diagnosis
has data that indicates that the mean number of days (µ)
between malfunctions is 1020 days, with a standard deviation of
20 days. Assuming a normal distribution, what is the probability
that the number of days between adjustments will be less than
1044 days? More than 980 days? Between 980 days to 1044
days?

26
Solution:
First, convert the value of x to a z-value. For x = 1044 days,
we have:

𝑥−𝜇 1044−1020
𝑧= = 𝑧= = 1.2
𝜎 20
This means that 1044 days is 1.2 standard deviations above the
mean of 1020 days. Therefore, using Appendix A, 𝑃(𝑋 ≤
1044) = 𝑃(𝑧 ≤ 1.2) = 0.885.

27
To find the probability that X exceeds 980 days, first find the
corresponding z-value:
𝑥−𝜇 980−1020
𝑧= = 𝑧= = -2.0
𝜎 20

Note that 𝑃(𝑋 ≤ 980) = 𝑃(𝑧 ≤ −2.0) = 0.023.


Therefore, 𝑃(𝑋 ≥ 980) = 1 − 0.023 = 0.977
Finally, to find the probability that X is between 1044 and 980 days, we
use formula:
𝑃(980 ≤ 𝑋 ≤ 1044) = 𝑃(𝑋 ≤ 1044) − 𝑃(𝑋 ≤ 980)
= 𝐹(1044) − 𝐹(980) = 0.885 − 0.023
= 0.862

28
Using the Normal Inverse Function
Suppose that the manufacturer of MRI scanners wishes to determine
the number of days for which the probability that the equipment
would not malfunction is 0.80. In this case, we know that 𝑃(𝑋 ≤
𝑥) = 0.8. This is equivalent to 𝑃(𝑍 ≤ 𝑧 = 0.8,where z = (x - 1044)/2-
0. From Appendix A, we can determine that z approximately equal to
0.84. Therefore, solving 0.84 = (𝑥 − 1044)/20 𝑓𝑜𝑟 𝑥 𝑦𝑖𝑒𝑙𝑑𝑠 𝑥 =
1060.8.

29
EXPONENTIAL DISTRIBUTION
• The time between randomly occurring events, such as the time to
or between failures of mechanical or electrical components.
• closely related to poisson distribution: if the distribution of the
time between events is exponential, then the number of events
occurring during an interval of time is poisson.

30
F(x) = λe-λx for x > 0
Where
1/λ = mean of the exponential distribution (note that
λ is the mean of the corresponding Poisson
distribution)
x = time or distance over which the variable extends
e =2.71828… (the base of natural logarithms)

The exponential distribution has the properties that it is bounded below


0, it has its greatest density at 0, and the density declines as x increases.

31
Using the Exponential Distribution
A company that makes electronic components for tablets devices
tested a large number of these components. They found that the
average time failure 1/λ = 4000 hours.
What is the probability that a component will fail within 500 hours?
After 4000 hours?
The mean rate of failure is λ= 1/4000 = 0.00025 failures/hour.
Therefore, the probability of failure within 500 hours is

F(500) = 1 - 𝑒 − 0.00025 500


= 0.1175

32
STATISTICAL METHODOLOGY
• Descriptive statistics are methods of presenting data
visually and numerically and includes charts (such as Excel
column, line and pie charts), frequency distributions and
histograms to organize and present data. Measures of
central tendency (means, medians, proportions) and
measures of dispersion (range, standard deviation,
variance).

34
Statistical inference is the process of drawing conclusions
about unknown characteristics of a population from which
data were taken.
• Techniques used in this process include confidence
intervals, hypothesis testing, and experimental design.
• Experimental design is important for helping to understand
the effects of process factors on output quality and for
optimizing systems.

35
Predictive statistical - to develop predictions of future values
based on historical data.
• Correlation analysis and regression analysis are two useful
techniques - these techniques can clarify the
characteristics of a process as well as predict future results.

36
SAMPLING
Population - refers to the group of things that we
want information about.
Sample - refers to part of the population that we take
out to examine and draw conclusions from.
Sampling - forms the basis for statistical
applications.

37
BIASED SAMPLES
occur when one or more parts of the population are favored
over others.
• Convince sample - only includes people who are easy to
reach.
• Voluntary response sample - consist of people that have
chosen to include themselves.

38
UNBIASED SAMPLE
• Simple Random Sampling - MOST BASIC OR COMMON SAMPLING.
Every item in the population has an equal probability of being
selected
• Stratified Sampling - The population is partitioned into groups, or
strata, and a sample is selected from each group.
STRATA - refers to the groups of similar people, within each stratum
we take srs. Good for making sure who ever is administrating this gets
in contact with each kind of group.
• Multistage sampling - we use a combination of two or more srs’s.
Comes with different stage to know where your sample is coming
from.

39
• Systematics Sampling - every nth(4th,5th. Etc) item is
selected
• Cluster Sampling - A population is partitioned into groups
(clusters) and a sample of clusters is selected. Divided into
cluster and chosen random
• Judgement Sampling or purposive sampling - Expert
opinion is used to determine the sample.

40
Sampling error occurs naturally and results from the fact that
a sample may not always be representative of the population,
no matter how carefully it is selected. To reduce sampling
error is to take a larger sample from the population.
Systematic errors, usually result from poor sample design and
can be reduced or eliminated by careful planning of the
sampling study.

41
DESCRIPTIVE STATISTICS
Summarizes the numerical characteristics of population or
samples.

POPULATION - is a complete set or collection of objects of


interest
SAMPLE - is a subset of objects taken from the population.

42
The most important types of Descriptive
Statistics and Formulas are:
1. Measures of Location
MEAN
- The mean of population is -The mean of sample is
denoted by the Greek µ denoted by x̄

We may calculate the mean in Excel using the function =AVERAGE(data


range)
43
MEDIAN – specifies the middle value MODE – is the observation that occurs
(or 50th percentile) when the data are most frequently.
arranged from smallest to largest. For - It is the most useful data sets that
odd number of observations, the consist of a relatively small number of
median is the middle of the sorted unique values.
numbers. For an even number of • In Excel you can use
observations, the median is the mean =MODE.SNGL(data range) or
of the two middle numbers. =MODE.MULT(data range) to
• We may find the median using the identify a single mode or multiple
Excel function =MEDIAN(data modes in the data, or simply
range) =MODE(data range).

44
2. Measures of Dispersion
Range is the simplest measure of dispersion and is computed as the
difference between the maximum value and the minimum value in the
data set.
• It is computed in Excel by the formula
=MAX(data range)-MIN(data range)

Variance is a measure of dispersion that depends on all the data. The


larger the variance, the more data are “spread out”.

45
FORMULA for the Variance of the FORMULA for the Variance of the
Population: Sample:

Where xi is the value of the ith item, N is Where n is the number of items in the
the number of items in the population, sample, and x̄ is the sample mean.
and µ is the population mean. • =VAR.P(data range) is used to
• In Excel, =VAR.S(data range) may be compute the variance of
used to compute sample variance, s 2 population, o 2

46
Standard deviation is the square root of the variance.
For Population the standard and for sample, it is:
deviation is computed as:

• =STDEV.P(data range) calculates the standard deviation for


a population
• =STDEV.S(data range) calculates the standard deviation for
a sample
*insert picture of excel (basic statistical measure)
47
Proportion is the fraction of data that have a certain
characteristic.
It is usually denoted as p.
• We may use the Excel =COUNTIF(range, criteria) function to
find the number of cells within a range that meet a specified
criteria and then compute the proportion as a ratio of the
count to the total number of observations.
*insert picture of excel (proportion)

48
3. Measures in Shape
Skewness describes the lack of symmetry of data.
Coefficient of Skewness (CS) measures the degree of asymmetry
of observations around the mean.
-if CS is positive, the distribution of values is positively skewed;
- if negative, it is negatively skewed.
The closer to zero, the less the degree of skewness.
• Greater than 1 or less than -1 suggests a high degree of skewness
• 0.5 and 1 or between -0.5 and -1 represents moderate skewness
• 0.5 and -0.5 indicate relative symmetry

49
Kurtosis refers to the peakedness or flatness of a histogram.
Coefficient of Kurtosis (CK) measures the degree of kurtosis
of a population.
• Less than 3 – more flat with a wide degree of dispersion
• Greater than 3 – more peaked with less dispersion

50
STATISTICAL ANALYSIS
WITH MICROSOFT EXCEL
THE EXCEL DESCRIPTIVE STATIISTICS TOOLS
2 Most Useful Excel Tools for statistical analysis are:
1. Descriptive Statistics tools
2. Histo gram tools

THE EXCEL DESCRIPTIVE STATIISTICS TOOLS


- is a convenient way of obtaining basic summary measures for
sample data.

52
THE EXCEL HISTOGRAM TOOL
Frequency distribution is a table that shows the number of
observations in each of several non-overlapping groups.
A graphical depiction of frequency distribution for numerical
data in the forom of a column chart is called a Histogram.

Frequency distribution and Histogram can be created by using


Data Analysis Toolpak in Excel.

53
STATISTICAL INFERENCE
SAMPLING DISTRIBUTIONS
A Sampling Distribution is the distribution of a statistic for all
possible samples of a fixed size.

STANDARD DEVIATION OF THE MEAN


𝜎
𝜎𝑥ҧ = 𝑛 For infinite populations

𝑁 −𝑛𝜎
𝜎𝑥ҧ =
𝑁 −1 𝑛 For finite populations

55
DISTRIBUTION OF A RANDOM VARIABLE
ҧ
𝑥−μ
z=
𝜎/√𝑛
The mean length of shafts produced on a lathe has historically
been 50 inches, with a standard deviation of 0.12 inch. If a sample
of 36 shafts is taken what is the probability that the sample mean
would be greater than 50.04 inches?

56
ҧ
𝑥−μ
z=
𝜎/√𝑛
μ=50
50.04−50
𝑥ҧ = 50.04 =
𝑜.12/√36
𝜎=0.12
n=36 =
0.04
0.02
z=2
57
CONFIDENCE INTERVALS
A confidence interval (CI) is an interval estimate of a population
parameter that also specifies the likelihood that the interval
contains the true population parameter.

COMMONLY USED CONFIDENCE LEVELS ARE:


• 90%
• 95%
• 99%

58
COMPUTING A CONFIDENCE INTERVAL WITH A KNOWN
POPULATION STANDARD DEVIATION
A laboratory in a hospital is required to ensure that the
temperature in their sterilizer stays at an average of at least
100˚C. over an extended period of time, the population
standard deviation has been shown to be stable at 𝜎=0.5. Find
the 95% confidence interval for the population mean if a
sample of 36 readings was taken, and the sample mean was
found to be 𝑥 ̅=100.3.
𝜎
𝑥+𝑧
√𝑛
59
𝜎
𝑥+𝑧
√𝑛
𝑥 = 100.3 0.5
100.3+1.96( )
z = 1.96 √36

𝜎 = 0.5 100.3+ 0.1633


n = 36
100.1367 to 100.4633

60
CONFIDENCE Z-VALUE
LEVEL

90% 1.645
95% 1.96
99% 2.58

61
HYPOTHESIS TESTING
• Hypothesis Testing involves drawing interferences about two
contrasting propositions relating to the value of a population
parameter.
• One of which is assumed to be true in the absence of
contradictory data (the null hypothesis), and the other which
must be true if the null hypothesis is rejected (the alternative
hypothesis).

62
STEPS IN A HYPOTHESIS TEST:

1. Formulate the hypotheses to test.


2. Select a level of significance.
3. Determine a decision rule on which to base a conclusion.
4. Collect data and calculate a test statistic.
5. Apply the decision rule to the test statistic and draw a
conclusion.

63
LEVEL OF SIGNIFICANCE
The Level of Significance defines the risk that we are willing to
take in making the incorrect conclusion that the alternative
hypothesis is true when in fact the null hypothesis is true.

COMMONLY USED LEVELS OF SIGNIFICANCE:


• 0.10
• 0.05
• 0.01

64
P-VALUE OR OBSERVED
SIGNIFICANCE LEVEL
An alternative approach to comparing a test statistic to a
critical value in hypothesis testing is to find the probability of
obtaining a test statistic value equal to or more extreme than
that obtained from the sample data when the null hypothesis is
true

65
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)?
• is an analysis tool used in statistics that splits an observed aggregate
variability found inside a data set into two parts: systematic factors
and random factors.
• Analysts use the ANOVA test to determine the influence that
independent variables have on the dependent variable in a regression
study.
Two parts of Analysis of Variance
• Systematic factors - have a statistical influence on the given data set.
• Random Factors - do not.

66
Regression and Correlation Analysis
Regression Analysis Correlation Analysis
• is used in stats to find trends in data. • is a statistical technique that can
• will provide you with an equation for show whether and how strongly
a graph so that you can make pairs of variables are related.
predictions about your data. • is fairly obvious your data may
contain unsuspected correlations.
Example
Example
• Eat and how much weight
• height and weight are related;
• Putting on weight over the last year
• taller people tend to be heavier
than shorter people.

67
DESIGN OF EXPERIMENT
is defined as a branch of applied statistics that deals with
planning, conducting, analyzing, and interpreting controlled
tests to evaluate the factors that control the value of a
parameter or group of parameters. DOE is a powerful data
collection and analysis tool that can be used in a variety of
experimental situations or series of test.

Example
Natural bread dough that would meet the same quality.

68
THANKS!
Any questions?

69
SlidesCarnival icons are editable shapes.

This means that you can:


● Resize them without losing quality.
● Change fill color and opacity.
● Change line color, width and style.

Isn’t that nice? :)

Examples:

70
😉
Now you can use any emoji as an icon!
And of course it resizes without losing quality and you can change the color.

How? Follow Google instructions


https://twitter.com/googledocs/status/730087240156643328

✋👆👉👍👤👦👧👨👩👪💃🏃💑❤😂
😉😋😒😭👶😸🐟🍒🍔💣📌📖🔨🎃🎈
🎨🏈🏰🌏🔌🔑 and many more...
71

Vous aimerez peut-être aussi