Data Science Training

Six Sigma Green Belt Training
Quality
The totality of features and characteristics of a product or service that bear on its ability to
satisfy stated or implied needs.
Two Aspects of Quality
1. The External Aspect

⇓
Meaning fitness for use.
2. The Internal Aspect

⇓
Meaning compliance with specifications.
“Quality then was to satisfy to satisfy customer needs it is in fact to delight customers”
External Aspects
(Customer’s Voice)
⇓
QFD, FMEA, DOE & TAGUCHI METHODS
DESS, BENCH Marking, Tolerance Design
⇓
Internal Aspects ⇒ Specifications
⇓
Compliance with Specifications
Quality Guru – Deming, Juran and Shewhart
We are in Business to Earn Profile

Today
Tomorrow
All Time to come
In an ethical and socially useful way
Equation Then:
Cost + Profile = Price
Equation Now:
Profit = Price – Cost
Reduction in cost is essential for survival
2
Bill Smith, Father Of Six Sigma
Smith introduced his statistical approach aimed at increasing profitability by reducing
defects.
His approach was, “ if you want to improve something, involve the people who are doing
the job.” He always wanted to make it simple so people would use it.
The origin of six sigma can be traced to the 1970s when Motorola faced with serious
quality – related problems, embarked on ambitious journey to achieve “ Zero defects” in its
products. This project was named “ Six Sigma” by Mikel Harry, then a senior staff
engineer with Motorola’s Government Electronic group.
Six Sigma is a highly disciplined approached used to reduced the process variations to the
extent that the level of defects are drastically reduced to less than 3.4 per million process,
product or service opportunities (DPMO).
This is termed as 3.4 defects Per Million opportunities (3.4×10⎯6 DPMO)
Sigma (σ) is Greek letter that is used in statistic to describe variability of a process. This
means “standard deviation”. Most of us may be familiar with the normal distribution and its
properties. We are aware of the properties of normal distributions.
¾ 99.73% of the area lies within means µ ±3σ
¾ 95.73% of the area lies within means µ ±2σ
¾ 68.26% of the area lies within means µ ±σ
PPM ( Part Per Million ) :
How many out of million (10,00,000 = 106)
Percentage (%)
How many out of 100
0.01% = 0.01 x 10,00,000 = 100 PPM
100
SIX SIGMA PROCESS CAPABILITY
Sigma Defects per million opportunities
6 Sigma 3.4 (World Class)
5 Sigma 230
4 Sigma 6,200 (Average)
3 Sigma 67,000 (Non-competitive)
2 Sigma 310,000
1 Sigma 7,00,000
Sigma Quality Level:
0.8406 + 29.37 – 2.221× ℓn (ppm)
The sigma quality level can be approximately determined using the (Schmidt and
Launsby1997) equation:
0.8406 + 29.37 – 2.221× ℓn (ppm) ⇒ this is called Sigma Scale
Six Sigma
• A top Driven, Disciplined Step By Step Approach (DMAIC) for Continual
Improvement of Quality for Benefit to all concerned.
• A system of practices to improve processor by eliminating defects.
• A disciplined data driven approach and methodology for eliminating defects in any
process.
3
What is Six Sigma
Six Sigma means several thing.
It is a statistical measurement. It tells us how good our product, services and process really
are. The Six Sigma method allows us to draw comparisons to other similar or dissimilar
products, services and process and help us in bench marking and plan for improvement. A
Six Sigma process is process is
Best - in -Class. On the other hand, four-sigma process is average. In this sense, the sigma
scale of measure provides us with a “goodness micrometer” for gauging the adequacy of
our products, services and process.
Six Sigma: Problem-by-Problem Approach.

Critical Business Issue
To
Critical Process
To
Critical To Quality Characteristics
To
Defining The Problem
Terminologies in Six Sigma

Customer: Anybody who is Recipient of a product of service is called a customer. He may
be external or internal.
Voice of Customer: An organization going in for Six Sigma must listen to the customer.
Customers requirements may be in the form of LINGALOR SPECIFICATIONS. Hence
customers requirements have to be translated into criteria’s to be incorporated in the
development of a process leading to product or service.
Critical to Satisfaction (CTS):
Critical to satisfaction of Customer, The aspects which will give him sufficient confidence
on the party.
For example:
Critical bugs will be fixed within a stipulated time.
Medical productivity in terms of Number of Transactions per unit time is at least0.90.
Call Quality rating is at least 0.85.The other measures are cost (CTC) and Delivery (CTD).
CTQ Tree is a tool that aids in translating customer Language into Quantified
requirements for products or services.
This helps in translating Broad Customer requirement in specifics. Ensures all aspects of
customer needs are identified.
Critical to Quality (CTQ): It is a parametric Representation of the voice of the customer.
Usually external customer specifies product / service CTQ.
For example call center application the maximum time for waiting for response is 60
seconds.
What is Critical To Quality Characteristics (CTQ):
• The requirements of the output of the process and measures of Critical process issue
are called a CTQ.
• CTQs have to be derived from customers requirements, risks, economics,
regulations and process / product FMEAs.
4
Quality: It is the totality of features and characteristics of a product or services that satisfy
the customers stated and implied needs: ISO Definition.
Quality in Six Sigma: A state in which value entitlement is realized for the customer as
well as for the provider in every aspect of the business relationship covering the entire
supply chain. It is a WIN –WIN approach for all
Cost of poor Quality: The cost of poor quality is defined as those costs associated with the
non-achievement of product or service quality as defined by the requirements established
by the organization and its contracts with customers and society.
Cost of poor Quality categories and Elements: There are four categories – prevention,
appraisal, internal failure and external failure. Each category contains elements and sub
elements.
Prevention: The prevention is defined as the experience gained from the identification and
elimination of specific causes of failure cost to prevent the recurrence of the same or
similar failure in other product and services.
Prevention cost like planning and training.
Appraisal Cost: The appraisal cost is the assurance that the product or service is
acceptable as delivered to customers.
Appraisal cost like inspection and testing.
Internal failure costs: Internal failure costs is defined to include basically all costs
required to evaluate, dispose of, and either correct or replace non confirming products or
services prior to delivery to the customer and also to correct or replace incorrect or
incomplete product or service description.
Internal failures like re-design of modules, reworking on effort
estimation, loss on productivity etc.
External failure cost: The External failure cost includes all costs incurred due to
nonconforming or suspected nonconforming product or service after delivery to the
customer.
External failures like Delayed submission of developed modules, customers
dissatisfaction etc.
All these costs are called components of cost of it is the hidden cost of failing quality to
meet customer requirement.
Process: Process is the requires of activities which result in a product or service.
Key process in input variable (KPIV): The input variable, which influences the output of
a process.
i.e. The time and Temperate are key input variables for Heat Treatment process.
Key process output variables (KPOV): The output variables, which influences the
performance of Critical to Quality (CTQ).
Defects: A feature in a product / service that causes dissatisfaction to a customer is called a
Defect.
ANYTHING THAT DISSATISFIES YOUR CUSTOMER
Process capability: Process capability is defined as the ability of your process to satisfy
customer requirement.
A process is said to be not capable if it fails to meet customer requirement.
Note:
I. Lower DPU increase customer satisfaction and decreased warranty cost.
II. Lower DPU reduces COPQ and decreased manufacturing cost per unit.
III. Higher process capability indices increase Six Sigma rating and reduce DPU.
5
Unit: It may be a product or process, a line of software, a transaction etc.
A “ Unit” may be as diverse as a:
• Piece of equipment
• Lien of softare
• Order
• Technical Manual
• Medical claim
• Wire transfer
• Hour of labour
• Billable dollar
• Customer contact.
Opportunity: A unit may have more than one type of defect. Each is an opportunity.
A watchcase may have pits, Burr etc. In a letter of credit (L.C.) opportunities are name,
address, shipping instructions, currency etc, are different opportunities for getting a
defect.
Metric: Metric is a representative indicator of performance of a process, product or
services.
I. If we do not measure. We do not know our status, so we cannot improve.
II. Defects per unit: Total Number of defect in a sample divided by Total number
of unit in the sample.
III. Defects per opportunity:
DPO = DPU / No. of opportunity × unit
IV. Defects per Million opportunities (DPMO):
DPMO = DPU × 10⎯6
No. of opportunity per unit
V. Throughput Yield: Output divided by Input
VI. Rolled throughput yield: Rolled throughput yielded is the product of yields of
all sub process. 0.93⇒ 0.95⇒ 0.95⇒ 0.95
If there are four process and each process is having 95% YIELD,
The rolled throughput yield (RY) = (0.95)4 = 0.81.
For other examples:
i. Let us assume that a part goes through ten operations. At each stage 99% parts
are good and 1% are reject, we get good 90.43% parts at the end of the tenth stage.
ii. If we start with a batch of 1000 parts we get 904 good parts and scarp or rework
96 parts, the RTY of the process is 90.43%.
Calculation of DPU, DPO, DPMO, Yield & Sigma level.
Defect = 34, Unit = 750,
Opportunities per unit = 10
1. DPU = D/U= 34/750 = 0.045
2. DPO = D /(U × O) = 34 / 750 × 10 = 0.0045
3. Yield = e (-DPU) = 2.7183(-0.045) = 0.956 = 95.6%
4. DPMO = DPO × 106 = 4500
5. Sigma Level = 2.611
6
Technical terminology of Six Sigma Management
CTQ: A CTQ is a measure or proxy of what is important to a customer.
I. Example of CTQ are the mean and range of the waiting times in a physician; office
for forum patients selected each at 10.00 am, 2.00 pm, 4.00 pm.
II. The percentage of error in ATM transactions for bank’s customers per month.
III. The number of car accidents per month on a particular stretch of highway. Six
Sigma projects are designed to improve CTQs.
Unit: A unit is the item (e.g. product or component, service or service step or time period
to be studies with a Six Sigma project).
Defective: A non-conforming unit is a defective unit.
Defect: A defect is a non-conformance on one of many possible quality characteristics of a

unit that causes customer dissatisfaction.
Defect Opportunity: A defect opportunity in each circumstance in which a CTQ can fail
be met. There may be many opportunities for defects within a defined unit. For example, a
service has four component parts. If each component part contains three opportunities for a
defect, then the service has 12 defect opportunities in which a CTQ can fail to be met.
Defects per unit (DPU): Defects per unit refers to the average of all the defects for a given
number of unit, that is, the total number of defects for n units divided by n, the number of
units.
If you are producing 50-page documents the units is a page. If there are 150 spelling
errors, DPU is 150/50 = 30.
Defects per Opportunity (DPO): Defects per opportunity refers to the average of all the
defects for a given number of unit, that is, the total number of defects for units divided by
the total number of opportunities.
DPO = DPU / Total number of opportunities.
Defects Per Million Opportunities (DPMO): DPMO equals DPO multiplied by one
million.
Yield: Yield is the proportion of units within specification divided by the total number of
units. If 25 units are served to customers and 20 are good, then the yield is 20/25 = 0.80.
Rolled Throughput Yield (RTY): Rolled Throughput Yield is the product of the yields
forms each step in a process. RTY is the probability of a unit passing through each of K
independent steps of a process the first time without incurring one or more defects an each
of the K Steps. RTY = Y1 × Y2 × ………… Yk where K = number of steps in a process or
the number of component parts or steps in a product or service. Each yield Y for each step
or component must be calculated to compute the RTY.
For those steps in which the number of opportunities is equal to the number of units,
Y= 1 – DPU. Where Y = e-DPU.
For example, if a process has three independent steps and the yields from the first step (Y1)
is 99.7% the yield from the second step is (Y2) is 99.5% and the yields from the third step
(Y3) is 89.7% then the RTY is 88.98% (0.997 × 0.995 × 0.897)
7
KANO MODEL: Kano surveys embrace a set of market research tools used for
three purposes:
• To improve existing products, services or processes or to create less- expensive
version of existing products, services, or processes called Level A surveys.
• To create major new features for existing products, services, or processes called
Level B surveys.
• To invent and innovate an entirely new product, services, or processes is called
Level C surveys.
KANO CATEGORIES: There are six KANO category classifications for cognitive
images.
• One Dimensional (O): User satisfaction is proportional to the performance of
the feature, the less performance, the less user satisfaction, and the more
performance, the more user satisfaction.
• Must –Be (M): User satisfaction is not proportional to the performance of the
feature, the less performance, the less user satisfaction to the feature, but high
performance creates feelings to indifference to features.
• Alterative (A): Again, user satisfaction is not proportional to the performance to
the feature. However, in this case, low level of performance creates feelings of
indifference to the features, but high levels of performance create feelings of
delight to the features.
• Reverse (R): The researcher’s a prior judgment about the user’s view of the
feature is the opposite of the user’s view.
• Indifferent (I): The user is indifferent to the presence and absence of the feature.
• Questionable (Q): There is contradiction to user’s response to the feature.
Customer satisfied Completive
Pressure
Satisfaction Region
Expected Quality
One-Dimensional
Attractive
Product (Exciting Quality) Product Fully
Dysfunctional Functional
Must-Be (Quality)
(Basic Quality)
Dissatisfaction Region
Customer Dissatisfied
Kano Features categories of Quality
8
The Six Sigma Methodology: The Six Sigma methodology also uses a modified
Shewhart cycle PDCA (Plan-Do-Check-Act) Deming’s PDSA (Plan- Do- Study- Act),
which is called the DMAIC (Define- Measure –Analysis –Improve –Control)
The variation is getting reduced as it passes through a funnel of the six methodology.
This is something called the breakthrough strategy,
Define Project
Process map, C&E,

All possible Xs Measure MSA, Cpk
Analyze
FMEA, Multi-vari
Improve Design of
Few ‘x’s
Control SPC, fail-safing,
Control Plan
Six Sigma Approach:

A five phase approached called DMAIC is followed:
D: Define project’s purpose and scope and get background on the process and customer.
M: Measure, focus the improvement record by gathering the current information.
A: Analyses, identify the root cause and confirm them with Data.
I: Improve, Develop, and try out and implement solutions that address the root cause.
C: Control, Evaluate the solutions and maintain the gains by setting up controls,
standardizing and documenting work methods, and process, anticipating future
improvements.
Define phase:
A. Identify project CTQs.
B. Develop team charter.
C. Define process Map.
1. Choose Critical Business and process Issue.
2. Understand the voice of the customers.
3. Define the process and CTQs.
4. Define the team and training needs.
5. Define scope and opportunities of the project.
6. Develop the charter.
7. Map the process.
9
Measure Phase:
A. Select CTQs (Customer, Product, Process)
B. Establish and validate measurement system.
C. Establish process capabilities.
1. Select the key product.
2. Create product tree.
3. Define performance variables
and measurement process.
4. Determine Data type and create check sheets.
5. Create detailed process map.
6. Select & measure performance variable carry out MSA.
Analysis Phase:
A. Bench marking & Goal setting.
B. Gap analysis & Root cause analysis
C. Identify sources of variations.
1. Establish performance capabilities.
2. Benchmark performance metrics.
3. Discover Best in class performance.
4. Conduct Gap Analysis.
5. Identify success factors.
6. Define performance goal.
Improve Phase:
A. Select & diagnose the performance variable.
B. Establish the optimum solution.
C. Establish the tolerance on X’s.
1. Create possible solutions for root cause.
2. Select solution – Reduction of process variations.
3. Propose and confirm casual variables.
4. Create and implement plans.
5. Verify performance improvement and evaluate benefits.
Control Phase:
A. Select the variable for establishing controls.
B. Establish control system.
C. Evaluate the control system.
1. Summarize and communicate results.
2. Define – validate – Implement- Monitor control system.
3. Fix owner ship.
4. Recommend future plan.
5. Train teams.
6. Monitor performance metrics.
10
Statistical methods in Six Sigma:
• Planning and collection of Data.
• Presenting data.
• Summarization of data.
• Analysis of data and
• Drawing valid inference from data, which are usually subject to variation.
What is statistical thinking?

Statistical thinking is a philosophy of learning and action based on the following
fundamental principles:
• All work occurs in a system of interconnected process.
• Variation exits in all process and
• Understanding and reducing variation are keys to success.
Deming Once Said

“ If I had to reduce my message for management to just a few words I had say it all had to
do with reducing variation.”
Relationship: Between satisfaction thinking and statistical methods.
Process → Variation → Data → Statistical Tools
Statistical Thinking Statistical Methods

Benefits of statistical thinking:
• Provides a theory and methodology for improvement.
• Helps identify where improvements is needed.
• Provides a general approach to take.
• Suggests tools to use.
A complete improvement approach includes alls elements of satisfied thinking.
Process ⇒ Variations ⇒ Data
Expanding world of statistics.
The way we think

Organizational Impact
Organizational
Improvement
Product process
Improvement
Problem
Solving
Time
11
Use of statistical thinking
Depends on level of activity and job responsibility
Where we’re Strategic Executives

Headed
Managerial process Managers

to guide us Managerial
Where the work

Gets done Operational Workers
Examples of operational processes
• Manufacturing
• Order Entry
• Delivery
• Distribution
• Billing
• Collection
• Service
Examples of Strategically thinking at the operational level
• Work process are mapped and documented

• Key measurement are identified
- Time plots displayed
• Process management and improvement utility
- Knowledge of variation, and
- Data
• Improvement activities focus on the process, not blaming employees.
Examples of Managerial process:
• Employee Selection
• Training and Development
• Performance Management
• Recognition and Reward
• Budgeting
• Setting objectives and goals
• Project Management
• Communication
• Management Reporting
• Planning
12
Examples of Strategically thinking at the Managerial level
• Managers use meeting management techniques.

• Standardized project management systems are place.
• Both project process and results are reviewed.
• Process variation is considered when setting goals.
• Measurement is viewed as a process.
• The number of suppliers is reduced.
• A variety of communication media are used.
Examples of Strategic Processes
• Strategic plan development

• Strategic plan development
• Acquisitions
• Corporate Budget development
• Communications – Internal and External
• Succession planning and Deployment
• Organizational Improvement
Examples of Statistical Thinking at the Strategic Level
• Executives use system approach.

• Core processes have been flow charged.
• Strategic direction defined and deployed,
• Measurement system is place.
• Employee, customer, and benchmarking studies are used to derive
improvement.
• Experimentation is encouraged.
Robustness in Management
• Develop strategies that are insensitive to economic trends and cycles.

• Design a project system that is insensitive to
o Personal Changes
o Changes in project scope
o Variations in business conditions.
• Responds to differing employee needs
• Adopt flexible work hours.
• Enable personnel to adopt to changing business needs.
• Ensure meeting effectiveness is not dependent on facilities, equipment, or
participants.
13
Understanding Human Behaviour
• Different people have different methods and styles of working, learning and
thinking.
• Different people take in process and communicate information in different
ways.
• People vary – they are different.
- Day to day
- Person to person
- Group to group
- Organization to organization
Three ways to reduce variations and improve quality:
Control the process

Eliminate special
Case variation.
Improve the system Quality

Reduce common Case Improvement
Variation.
Anticipate variation
Design Robust
Process and Products
Process Robustness Analysis

• Identify those uncontrolled factors the affect process performance
o Weather
o Customer use of products
o Employee knowledge, skills, experience work habits.
o Age of Equipment
• Design the process to be insensitive to the uncontrollable variants in the factors.
14
Population: Collection of all elements under consideration and about which we are trying
to draw conclusions.
Population elements may be:
• Objects
• Entities
• Units
• People ……… etc.
Generally each has one or more characteristics (attributes) of interest when a particular
characteristic is measure we obtain a value, which varies from case to case – hence each
characteristics is termed as variable. Recording the value of a variable for each case
amounts to collecting data.
Sample: A subject of the element selected from a population with a view to draw inference
about the population characteristics.
• A sample is part of population.
• Objective of statistics is to drawl conclusion about the population using sample
data.
Population
Sample
A portion or subset of the population

Sample data should be
• Relevant
• Representative
• Adequate
• Reliable
Advantages of sample
• Sampling is less costly (cost effectiveness).
• Total enumeration may not also be free from errors (Inspection Fatigue).
• Sampling inspection may have relatively less inspection error and sampling error
can be estimated.
• When inspection is destructive, sampling is the only way.
Types Sample
Random Sample: Each member of the population has an equal chance of being selected.
Simple Random Sample: All samples of the same size are equally likely.
• Assign a number to each member of population number table. Software program or

a calculate
• Data from members of the population that correspond to these numbers become
members of the sample.
15
Simple Random sample:
• Each pollution element has an equal change of being selected.
• Selecting 1 subject does not effect selecting others.
• May use random number table, lottery.
Stratified Random Samples:
Divide the population into groups (strata) (layers) and select a random sample from each
group. Strata could be raw material, vendors or process,
For example
Sample
Cluster Samples: Divide the population into individual units or groups and randomly
select one or more units. The sample consists of all members from selected units (s).
Cluster samples
Systematic Samples:
Choose a starting value of random, and then choose sample members at regular intervals.
X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X
We say we choose every Kth member, in this example K=5, every 5th member of the
population selected.
Convenience Sample:
Choose readily available members of the population for your sample.
Statistical Methods
• Descriptive statistics
- Collecting and describing data.
• Inferential statistics
- Drawing conclusions and / or marking decisions concerning a
population based only on sample data.
Descriptive statistics
• Collect Data
e.g. survey
• Present data
e.g. Tables and graphs
• Characterize data
e.g. sample mean
Inferential statistics (Conclusion)
• Estimation
e.g. Estimate the population mean weight using the sample mean
weight
• Hypothesis testing (Assumption)
e.g. Test the claim that the population mean weight is
Drawing conclusions and / or marking decisions concerning a population
based on sample results.
16
DATA SOURCES
Primary Data Secondary Data

Collection Compilation
Observation Experimentation Survey Print or Electronic
Statistical Studies:
Statistical Studies
Enumerating Study Analytical Study

Enumerating Study
• Involve decision making about a population
1. Frame is listing of all population units
Examples: Name in telephone book
Example: Political Poll
Analytical Study
• Involves action on a process.
• Improve future performance.
• No identifiable universe or frame.
e.g. production process
Types of Data Data
Categorical Numerical
(Qualitative) (Qualitative)
Discret Continuous
17
Data summarization methods:
• Graphical Methods.
• Tabular summarization.
• Numerical Indices.
Graphical Methods:
Graphic displays provide better in sight that often is not possible with words or members.
Contingency table
• Shows # observations jointly in two categorical variables.
Example- Male employee
Gender variable and major variable
• May include raw, column or total %
• Helps find relationship.
• Used widely in marketing.
1. Residence: C C O O C C O O C O
Gender: M F F M M M F M MF
Where C = on campus, O = off–campus, M = Male, F = Female
Residence Male Female Total

On – campus 4 1 5
(80) (20) (100)
Off – campus 2 3 5
(40) (60) (100)
Total 6 4 10
(60) (40) (100)
2. You are a marketing research analysis for visa. You want to analyze data on
credit card users annual income
Income: 12 20 32 45 72 46 18 55
Use: Y N N Y Y Y N Y
(Income categories: US $25,000, $25,000 & over)

Use categories: Y = use credit cards, N = don’t use
Income No Yes Total

Under $25 K 2 1 3
(67) (33) (100)
Total 3 5 8
(38) (62) (100)
Graphical Tools
• Bar Chart
• Pie Chart
• Histogram
• Frequency Curve
• Scatter Diagram
• Control Charts
• Box Plots
18
Bar Chart:
Bar length
Frequency Equal Bar width
150
100
50
0
Acct. Econ. Mgm t.
Zero point
Pie Charge:
• Shoes breakdown of total quantity into categories.
• Useful for showing relative difference.
• Angle size – (360° x percent) = 360° x 10% = 36°
Econ
Mgmt
Acct
Example: You are on analyst for IRI, you want to show the market shares held by windows
program manufactures in 1992, Construct a BAR graph & PIE chart to describe the data.
Mfg. Mkt. Share (%)
Lotus 15
Microsoft 60
Word perfect 10
Others 15
Dot plot:
1. Condenses data by grouping the same values together.
2. Numerical value is located by a dot on horizontal axis.
3. Data: 21,24,24,26,27,27,30,32,38,42.
ο οο οοο ο ο
20 25 30 35 35 40 45
Stem -and leaf display:
1. Divide each observation into step value and leaf value.
– Stem value defines class
- Leaf value defines frequency
2. Data: 21,24,24,26,27,27,30,32,38,41
2 144677
3 028
4 1
19
Histogram:
It is bar chart of frequency distribution. It highlights the center and amount of variation in
the sample of data. The simplicity of construction and interpretation of the histogram
makes it an effective tool in the elementary analysis of data. Many problems in quality
control have been solved with this one elementary tool alone.
LSL Tolerance USL
Frequency Capability
A typical histogram show in the above fig,

The Histogram described the variation in the variant in the process.
It is used to
1. Solved problems.
2. Determine the process capabilities.
3. Compare with specification.
4. Suggest the shape of the population, and
5. Indicate discrepancies in data such as gaps.
The graph of figure use smooth curves rather than the rectangular shapes associated with
the Histogram. A smooth curve represents a population frequency distribution, whereas the
Histogram represents a sample frequency distribution.
A measure of central tendency of a distribution is a numerical value than described the

central position other data or how the data tend to buildup in the center. There are three
measures to common use
1. Mean.
2. Median.
3. Mode.
20
Mean:
The mean is the sum of the observation divided by the number of observations. It is the
most common measure of central tendency.
Numerical Indices: Data can be summarized using
• Measure of central tendency.
• Measure of dispersion.
• The most common measure of central tendency
• Affected by extreme value (outliners)
Measure of central tendency: A value, which is representative of the set up of data as most
of the data is centered around the value. Important measures of central tendency
Mean (Arithmetic Mean).
Ungroup data:
_ n
Mean (X) = X1+X2……………….Xn = ∑ Xi
i=1 n
Where X = Average
n = number of observed value.
Group data:
X X1 X2 ………….. Xk
Frequency f1 f2 ………….. fk
Where n = sum of the frequencies.
fi = frequency in a cell or frequency of an observed value.
Xi = Cell midpoint or an observed value.
k = number of cell or numbers of observed values.
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Temp.°C (X) No. of days (f) Xf

25 2 50
26 3 72
27 4 128
28 3
29 1
30 2
Total 15 406
_
Average Temp (X)
= 406/15
= 27.07
21
Medium (M)
The median is defined as the value, which divides a series of ordered observation so that
the number of items above it is equal to the number below it.
• Robust measure of central tendency.
• Not affected by extreme values.
• In an ordered array, the median is the “middle” number.
Ungrouped data:
I. If n or N is odd, the median is middle number (n+1).
2
II. If n or N is even no, the median is the average of the two middle numbers (n, n+1)
2 2
1. Arrange all valued in order of size from smallest to largest
2. If the number of values (n) is odd, the median is center value in the ordered list. The
location of median is obtained by counting (n+1) observations from the bottom of the list.
2
Consider the data set: 490, 400, 450, 420 and 430 to find the median of this data,
We first arrange the data from the smallest to largest value
e.g. 400, 420, 430, 450, 490
The median is in the position (n+1) = (5+1) = 3
2 2
a. If the observation is even, the median M is given by the average of the two center
observations in the ordered list.
e.g 70, 75,77,82,88,100,105,108
the median is the average of the 4th and 5th value
i.e. (82 + 88) = 85
2
The median has several advantages over the mean the most important is that extreme value
do not affect median as strongly as they do the mean. That is the mean is much more
sensitive to outliner value as compared to the median.
Group data:
n _ Cfm
M = Lm + 2________ × i
fm
Where M = Median.
Lm = lower boundary of the cell in the median.
n = total number of observations.
Cfm = cumulative frequency of all cell below Lm
fm = frequency of median cell.
i = cell interval
The median of grouped data is not used to frequently.
22
Mode:
The mode of set of numbers is the value that occurs with the greatest frequency.
• A measure of central tendency.
• Value that occurs most often
• No affect by extreme values.
• Used for either numerical or categorical data.
• There may bee no mode
• There may be several modes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
The empirical relationship among the mean, median and mode are
Mean – Mode = 3 [mean – median]
Percentile: The pth percentile of data is the value such the P percent of the observations fall
at or below it.
The median is the 50th percentile the first quartile is 25th percentile and the third
quartile is the75th percentile.
Example: You are a financial analyst for a Bank. You have collected the following closing
stock prices of new stock issues: 17, 16,21,18,13,16,12,11
Describe the stock prices in terms of central tendency.
_ n
Mean (X) = ∑Xi /n = X1+X2……………+X6
i=1
6
17+16+21+16+13+16+12+11 = 15.5
6
Median (M)
Raw Data: 16 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position 1 2 3 4 5 6 7 8
Position Point: ( n and n+1 )

2 2
Median (M) = 16+16 = 16
2
Mode
Mid range = X smallest + X largest = 11+21 = 16
2 2
Q1 Position = 1. (n+1) = 1.(8+1) = 2.5
4 4
Q1 = 12.3
Q3 Position = 3. (n+1) = 3.(8+1) = 6.75 = 7
4 4
Q3 = 18
23
Dispersion:
Variation is a fact of nature and in industrial life too. No two items produced by same
process are exactly the same. Test done on the same samples may vary from chemist to
chemist or from laboratory to laboratory. This is true whether the test equipment involved
is automatic or manually operated.
Variation can be because of lack of complete homogeneity of chemicals used in test,
variation in test environment conditions or due to difference in the skill of chemists or
testing variation in the test result adds to the uncertainty of decisions and hence it is
important to measure variation and control.
Measure of variation:
Variation
Range Variance Standard and

Deviation
Population
Variance Population
Interquartile standard
Range Deviation
Sample
Variance
Sample
Standard
Deviation
In summarizing data, the variability in the values is often an important feature of interest.
major measures of dispersion are:
Range (R):
The range is the difference between the largest and smallest value in a data set.
That is range (R) = Largest value – Smallest value
Range is
• Measure of variation
• Difference between the largest and the smallest observations.
Range = X largest – X smallest

7 8 9 10 11 12, 7 8 9 10 11 12,
Range = 12-7 = 5 Range = 12-7 = 5
• Ignore the way which data are distributed.

• Used for small samples.
24
Standard deviation and Variance:
The most commonly used measure of dispersion is called the standard deviation.
The standard deviation is a numerical value in the units of the observed values that measure
the spreading tendency of the data. A large standard deviation shows greater variability of
the data than does a small standard deviation.
Standard Deviation
• Most important measure of variation
• Shows variations about the mean.
• Has the same unit.
It takes into account all the values in set of data.
Population standard deviation: It is denoted by the Greek symbol σ and given by root
means squared deviation from the mean µ
Suppose the best result values are
X1, X2, X3,…………………. XN N
σ = ∑ (Xi - µ)2
i=1
N
Where σ = Population standard deviation.
Xi = Observed value.
N = Number of observe value.
µ is the population mean.
Sample standard deviation (S):

If the sample results values are X1, X3, X3, …………………. Xn
It is given by
Ungroup data:
n _
S = ∑ (Xi − X)2
i=1
n
Group data:
h h
σ = ∑ (fiXi2 ) −∑ (fiXi)2
i=1 i=1
n(n−1)
Variance:
Population variance (σ2) n
σ2 = ∑ (Xi − µ)2
i=1
N
2
Sample variance (S )
n _
S = ∑ (Xi − X)2
i=1
n-1
25
Standard deviation of the sample test values:
Xi Xi – X (Xi-X) 2
15 -5 25
18 -2 4
20 0 0
21 1 1
26 6 36
X=10 ∑ 0 ∑66
_ _
X = 100/5 = 20, S = ∑ (Xi − X)2
n-1
S = √ 66/4 = 4.062.
Sample standard deviation (S) = 4.062 and Sample variance 66/ 4 = 16.5.
Same facts about standard deviation formula

The above table will be used to explain the standard deviation concept.
• The first column (Xi) gives five observed value and from these value the average X
= 10 is obtained. _
• The second column (Xi – X) is the deviation of the individual observed values from
the average. If we sum the deviation (0), which is always the case, but it will not
lead to the measure of dispersion.
• However, if the deviations are squared, they will all be positive and this sum will be
greater then zero.
• The average of the squared deviations can be found by dividing by n, however, for
theoretical reasons we divide by n-1, thus, which gives an answer that has the units
squared. This result is not acceptable as a measure of the dispersion but is valuable
as a measure of variability for advanced statistics. It is colleted the variance and is
given the symbol S2.
Coefficient of variation:
The standard deviation is an absolute measure of dispersion that expresses variation in the
some units as the original data. It cannot be sole basis for comparing two distributions
especially if the data are measured on different scales or if larger mean has larger variation.
In such cases, we use coefficient of variation.
It is a relative measure of variations. It relates the standard deviation and the mean and
expresses standard deviation a percentage of mean.
The formula for coefficient of variations
Coefficient of variation (CV) = Standard deviation (σ) ×100
Mean (µ)
Example: laboratory one can complete on an average 40 analyses per day with a standard
deviation of 5. Where as laboratory second can complete 160 analyses per day with a
standard deviation of 15.
Which laboratory shows more consistency?
Lab 1: Coefficient of variation 5 / 40 x 100 = 12.5%
Lab 2: Coefficient of variation 15 / 40 x 100 = 9.4%
Laboratory 2 has less relative variation.
26
Example: You are a financial analyst for a bank you have collected the following closing
stock prices of new stock issue 17, 16,2118,13,15,12,11.
Describe the volatility of the stock price.
Data 17, 16,2118,13,15,12,11.

n _
S = ∑ (Xi − X)2
i=1
n-1
_ n
Mean (X) = ∑Xi /n = X1+X2……………+X8 = 15.5
i=1
8
S2 = (17-15.5)2 +(16-15.5)2 +…………….(11-15.5)2 = 11.14

8-1
S = √11.14 = 3.34
Coefficient of variation (cv) = (S/X)×100 = 3.34/15.4×100 = 21.5%
Quartile: Quartiles divide the data into four equal parts. Each part contains 25% of the
values Q1 is called the first or lower quartile and Q3 is called the third quartile higher
quartile Q2 is the median.
Inter quartile Range (IQR): It is the difference between the third and the first quartiles of a
set of values. That is Inter quartile range
IQR = Q3 – Q2
Inter quartile range is a simple measure of speed that gives the range covered by the middle
half of the data. It reflects the variability of the middle 50 per cent of the data.
The quartiles and the IQR are unaffected by extreme values.
Inter quartile range
¼ of values ¼ of values
Min value Q1 Q2 Q3 Max value
Ist IInd IIIrd

Quartil Quartil Quartil
Calculation of quartile:
• Arrange the data in the increasing order and locate the median.
• The first quartile in the median of the observation below the location of the median.
• The third quartile in the median of the observations above the median of the
observations.
27
Example: Data below given the daily emission of Sulphur oxide of an industrial plant
15.8, 26.4 17.3 11.2 23.9 24.8 16.2 12.8 22.7 28.8 7.2 13.5
18.1 17.9 23.5
Determine the quartile and Inter quartile range
Arrange the data in increasing order i.e.

7.2 11.2 13.5 15.8, 16.2 17.3 17.9 18.1 22.7 23.5 23.9 24.8
26.4 28.8
Q2 = Median = 17.9, Q1= 13.5 and Q3 = 23.9
Inter quartile range (IQR) = Q3 – Q1 = 23.3-13.5 = 10.4

Box and whisker plot
Graphical display of data suing 5 – number summary.
X smallest Q1 Median Q3 X Largest
4 6 8 10 12
Relationship among the measures of central tendency.

Difference among mean, median and mode are shown in the above figure. When the
distribution is symmetrical, the values for the mean, median, and mode are identical, when
the distribution is skewed the values are different.
The median is the most commonly used measure of central tendency. It is used when the
distribution in symmetrical.
The median becomes an effective measure of the central tendency when the distribution is
to the right or left skewed. It is used when an exact midpoint of a distribution is desired.
When a distribution has extreme values, the mean will be adversely affected while the
median will remain unchanged.
The mode is used when a quick and approximate measure of the central tendency is
desired.
Symmetrical Right- skewed Left- skewed
Mean Median Mode Mode Mean Mean Mode

Median Median
28
THE NORMAL CURVE:
A population curve or distribution is developed from a frequency histogram as the sample
size of a histogram gets larger and larger, the cell interval is very small, the histogram will
take on the appearance of a smooth polygon or a curve representing the population is called
Normal curve or Gaussian distribution.
The normal curve is a symmetrical, unimodal, bell-shaped distribution with the mean,
median and mode having the same value.
f(z)
00 -3 -2 -1 0 1 2 3 Z
All normal distributions of continuous variables can be converted to the standardized
normal distribution by using the standardized normal value Z.
Z = Xi − µ
σ
The formula for the standardize normal curve is:
Z2 Z2 where = 3.14159
Z = 1 e¯ 2 = 0.3989 e¯ 2 e = 2.71828
2π² Z = Xi − µ
σ
Properties of Normal distribution
1. Mean, Median and More are identical
2. It is a bell shaped curve.
3. Symmetric about the mean
4. The curve starts from –∞ to +∞
5. The curve represents a population of infinite size. It is defined by two
parameters i.e. mean and standard deviation.
29
Relationship to the Mean and Standard Deviation we have seen by the formula for the
standardized normal curve, there is definite relationship among the mean, the standard
deviation and the normal curve
σ =1.5
σ =3.0
σ = 4.5
X
Above figure show three normal curves with the same mean but different standard
deviations. i.e. larger the standard deviation, the flatter the curve data are widely dispersed,
and the smaller the standard deviation, the more peaked the curve data are normally
dispersed. If the standard deviation is zero, all valued are identical to the mean and there is
no curve.
A relationship exists between the standard deviation and the area under the normal curves
shown in figure.
Limits % Area covered

µ ±1σ 68.26%
µ ±2σ 95.46%
µ ±3σ 99.73%
µ±∞ 100%
-3σ -2σ -1σ µ 1σ 2σ 3σ

68.26%
95.46%
99.73%
Application:
1. The main application is 99.73% of the area covered between – 3 to + 3 limits.
2. It is base for control charts.
3. It is possible to find out the percentage of the data, which are less than the
particular value, greater than particular value and between the two specified
limits.
30
Sigma Level:
Calculate normal value Z
Where Z = Xi - µ
σ
The Z value indicates how many sigma (σ) units the X value id from the mean (µ).
For example, if the USL for process is 16, and the process average and standard deviation
are calculated as 10.0 and 2.0 respectively then the Z value corresponding to the upper
specification is Z = (16 –10) =3.0
2
Using the normal tables, a Z value of 3 equals a probability of 0.99865, meeting that
99.865% of the process distribution is less them than the X value that is there sigma units
above the mean. That implies that
Measured
process
1-0.999865 = 0.00135 or 0.135%
of the process exceeds this X value
i.e. 0.00135 x 106 = 1350 DPMO
thus the sigma level is = 4.5
Z= ±3
0.135%
Mean
99.865%
31
Statistical process control:
A collection of strategic, techniques and actions taken by an organization to ensure they are
producing a quality product or providing a quality service.
• A methodology for monitoring a process to identify special causes of variation and
signal the need to take corrective action when appropriate.
• SPC relies on control charts.
• Establish state of statistical control.
• Monitor a process and signal when it goes out of control.
• Determine process capability.
Sources of variation:
There is variation in all parts produced by manufacturing process.
Chance variation is random in nature and cannot be entirely eliminated.
Assignable variation is not random in nature and can be reduced or eliminated by
investigating the problem and finding the cause.
Variation:
Types of variation:
Variation Cause Process
Normal or Chance Common or Natural In control
Unusual or Abnormal Special or Assignable Out of control
Cause of variation in Quality:
In a manufacturing process the quality of any product will vary from product to due to
various causes.
1. Chance Causes. 2. Assignable causes.
Chance cases: A course variation that is small is magnitude and difficult to identify, also
called random or common cause.
Assignable cause: A cause of variation that is large in magnitude and easily identified, also
classed special cause.
Unnatural variation
Assignable causes present operations.
Subgroup average
UCL
_ Natural variation
X Chance causes present management system.
LCL
Unnatural variation
Subgroup Assignable causes present operations.
32
Quality terminology:
Quality Assurance refers to the entire system of policies, procedures and guild lines
established by an organization to achieve and maintain quality.
The objective of Quality Engineering is to include quality in the design of products and
process and to identify potential quality problems prior to production.
• Quality control consists of making a series of inspections and measurements to
determine whether quality standards are being met.
• The goal of SPC is to determine whether the process can be continued or whether it
should be adjusted to archive a desire quality level.
• If the variation in the quality of the production output is due to assignable cause the
process should be adjusted or corrected as soon as possible.
• IF the variation in output is due to common cause which the manager cannot
control. The process does not need to be adjusted.
• SPC procedures are based on hypothesis - testing methodology.
• The null hypotheses Ho is formulated in terms of production process being in
control.
• The alternative hypothesis H1 is formulated in terms of the process being out of
control.
• As with other hypothesis – testing procedure, both a Type I error (adjusting an in –
control process) and a Type II error (allowing an out of control to continue) are
possible.
• SPC uses graphical displays known as control chart to monitor a production
process.
• Control charts provide a basis for deciding whether the variation in the output is due
to common cause (in control) or assignable causes (out of control).
SPC applied to services

• Nature of defect is different in service.
• Service defect is a failure to meet customer’s requirements.
• Monitor times, customer satisfaction.
Service quality examples:

Hospitals
Timeliness, responsiveness, accuracy of lab tests
Grocery stores
Checkout time, stocking, cleanliness
Airlines
Luggage handling, waiting times, courtesy
Fast food restaurants
Waiting times, food quality, cleanliness, employee courtesy.
Catalog-order companies
Order accuracy, operator knowledge and courtesy, packaging, delivery time, phone order
waiting time.
Insurance companies
Billing accuracy, timeliness of claims processing, agent availability and responses time.
33
Process charts:
Tools for monitoring process variation.
The figure on the following slide shows a process control chart. It has an upper limit, a
centerline, and a lower limit.
Control chart (Shewahrt control chart-3σ)
Upper Control limit Each point represents data
UCL from a sample that are plotted
Sequentially.
CL Centerline
LCL Lower Control Limit

Variables: A variables is a continuous measurement such as weight, height or volume.
Attribute: An attribute is the result of a bionomical process that results in an either -or
situation.
The most common types of variable and attributes charts .
Central requirements for property using process charts.

• You must understand the generic process for implementing process charts.
• You must know how to interpret process charts.
• You need to know when different process charts are used.
• You need to know how to compute limits for the different types of process charts.
Understanding process variation:

• Random variation is centered around a mean and occurs with a consistent amount of
dispersion.
• The type of variation cannot be controlled. Hence, we refer to it as “uncontrolled
variation”.
• The statistical tools discussed in this talk are not designed to detect random
variation.
• Non-random or “special causes” variation results from some event. The event may
be a shift in a process mean or some unexpected occurrence.
34
Process stability:
Means that the variation we observe in the process is random variation. To determine
process stability we use process charts.
Sampling Methods:
To ensure that processes are stable, data are gathered in sample.
Random samples: Randomization is useful because it ensure independence among
observation. To randomize means to sample is such a way that every piece of product has
an equal chance of being selected for inspection.
Systematic sample: Systematic samples have some of the benefits of random samples
without the difficulty of randomizing.
Sampling by Rational subgroup: A rational subgroup is a group of data that is logically

homogenous, variation within the data can provide a yardstick for setting limits on the
standard variation between subgroups.
A generalized procedure for developing process charts
• Identify critical operations in the process where inspection might be needed. These
are operations in which, if the operation is performed improperly, the product will
be negatively affected.
• Identify critical product characteristics, these are the attributes of the product that
will result in either good or poor function the product.
• Determine whether the critical product characteristic is a variable an attribute.
• Select the appropriate process control chart from among the many types of control
charts. This decision process and types of chart available are discussed later.
• Establish the control limits and use the chart to continually improve.
• Update the limits when changes have been made to the process.
35
X-bar and R Charts:
The X-bar chart is a process chart used to monitoring the average of the characteristics
being measured. To set up an X-bar chart select samples from the process for the
characteristic being measured. Then from the samples into rational subgroups, next, find
the average value of each sample by dividing the sums of the measurements by the sample
size and plot the value on the process control X-bar chart.
The R Chart is used to monitor the variability or dispersion of the process. It is used in
conjunction with X-bar chart when the process characteristic is variable. To develop on R
chart, collect samples from the process and organize them in to subgroups, usually of three
to six items. Next, compute the range, R by taking the difference of the high value on the
subgroup minus the low value. Then plot the R values on the R-Charts.
I. Control charts for variables:

- X-bar charts track process means.
- Range charts track process variation.
_
X chart control limits
= _ = k
UCLX = X + A2R Where X = ∑ Xi
i=1
= _ k
LCLX = X − A2R Where K is not sub group.
R chart control limits

_ _ k
UCLR = D4R Where R = ∑ Ri
i=1
_ k
LCLR = D3R Where K is not sub group.
X- Bar chart R- chart
UCL UCL
Mean
Rang
LCL LCL
Sample Sample
II. Control charts for Attributes:
- We now shift to charts for attributes. These charts deal with binomial and poison
processes that are not measurements.
- We will now be thinking in terms of defects and defectives rather than diameter or
widths.
- A defect is an irregularity or problem with a larger unit.
- A defective is a unit that, as a whole, is not acceptable or does not meet specifications.
36
p-Charts for proportion Defectives:
- The P-chart is a process chart that issued to graph the proportion of items in a sample
that are defective (Non confirming to specifications).
- P-charts are effectively used to determine when there has been a shift in the proportion
defective for a particular product or service.
- Typical applications of the P-chart include things like late deliveries, incomplete orders,
and clerical errors on written forms.
p-Chart. _
UCLP = P + Z σP
_
LCLP = P − Z σP
_ _
σP = ∑ P(P − P)
n
_
P = average % defective in sample.
n = sample size.
Z= 3
d _
P = n , P = Total defectives_______ = ∑d
Total sample observation ∑n
_ _ _
Proportional Defective
UCLP = P + 3 P(1 - P) UCL

n
_ _ _ CL
LCLP = P − 3 P(1 - P)
n LCL
Sample Number
np-Charts:
- The np-chart is graph of the number of defective (or non confirming units) is a subgroup.
The np-chart requires that the samplings of each subgroup be the same each time a
sampling drawn.
- When subgroup sizes are equal, either the p or np-chart can be used. They are essentially
the same chart.
- Some people find the np-chart easier to use because it reflects integer number rather than
proportions. The uses for the np-chart are essentially the same as the uses for the
p-chart. _
Centerline (CL) = n p
_ _ _
UCLnp = np + 3 np(1 - p)
_ _ _
LCLnp = np − 3 np(1 - p)
_ _ _ _
n p = ∑ np , p = np , CL = np
N n
37
c- Charts:
- The c chart is graph of the number of defects (Nonconformities) per unit. The units must
be of the same sample space, this includes size, height, length, volume and so on. This
means that the “area of opportunity” for finding defects must be the same for each unit
several individual units can be grouped as if they are one unit of a larger size.
- Like other process charts, the c-chart is used to defect nonrandom events in the life of a
production process. Typical applications of the c-chart include number of flows in an auto
finish, number of flaws in a standard typed letter, and number of incorrect responses on a
standardized test.
_ UCL
Process average c = Total no. of defects UCL
Number of defective
Total no. of sample
_
Sample standard deviation σc = c CL
_ _ _
UCLc = c + Zσc = c + 3 c
_ _ _ LCL
LCLc = c + Zσc = c − 3 c
Sample Number
u-charts:
- The u-chart is a graph of the average number of defects per unit. This is contrasted with
the c-chart, which shows the actual number of defects per standardized unit.
- The u-chart allows for the units sampled to be different sizes, area, heights and so on, and
allows for different numbers of units in each sample space. The uses for the u chart are the
same as the c-chart.
s-chart
The s (standardized deviation) chart is used in place of the R-chart when a more sensitive
chart is desired. These charts are commonly used in semiconductor production when
process dispersion is watched very closely.
38
Example X-Bar R chart
# Calculate sample means, sample ranges, mean of means, and mean of ranges.
Sample Obs.1 Obs.2 Obs.3 Obs.4 Obs.5 Avg. Range
1 10.68 10.689 10.776 10.798 10.714 10.732 0.116
2 10.79 10.86 10.601 10.745 10.779 10.755 0.259
3 10.78 10.667 10.838 10.785 10.723 10.759 0.171
4 10.59 10.727 10.812 10.775 10.73 10.727 0.221
5 10.69 10.708 10.79 10.758 10.671 10.724 0.119
6 10.75 10.714 10.738 10.719 10.606 10.705 0.143
7 10.79 10.713 10.689 10.877 10.603 10.735 0.274
8 10.74 10.779 10.11 10.737 10.75 10.624 0.669
9 10.77 10.773 10.641 10.644 10.725 10.710 0.132
10 10.72 10.671 10.708 10.85 10.712 10.732 0.179
11 10.79 10.821 10.764 10.658 10.708 10.748 0.153
12 10.62 10.802 10.818 10.872 10.727 10.768 0.250
13 10.66 10.822 10.893 10.544 10.75 10.733 0.349
14 10.81 10.749 10.859 10.801 10.701 10.783 0.158
15 10.66 10.681 10.644 10.747 10.728 10.692 0.103
Averages 10.728 0.2204
_
= _
UCLX = X + A2R = 10.728 + 0.58(0.2204) =10.856
= _
LCLX = X − A2R = 10.728 − 0.58(0.2204) =10.601

_
UCLR = D4R = (2.11)(0.2204) = 0.46504
_
LCLR = D3R = (0) (0.2204) = 0
# You’re manager of a 500-room hotel. You want to analyze the time it takes to deliver
room service food orders to room. For 7 days, you collect data on 5 deliveries per day. Is
the process in control?
Day Delivery Time Mean Range

1. 7.30 4.20 6.10 3.45 5.55 5.32 3.85
2. 4.60 8.70 7.60 4.43 7.62 6.59 4.27
3. 5.98 2.92 6.20 4.20 5.10 4.88 3.28
4. 7.20 5.10 5.19 6.80 4.21 5.70 2.99
5. 4.00 4.50 5.50 1.89 4.46 4.07 3.61
6. 10.10 8.10 6.50 5.06 6.94 7.34 5.04
7. 6.77 5.08 5.90 6.90 9.30 6.79 4.22
Average 3.894
39
# A manufacturer of chair wheels wishes to maintain the quality of the manufacturing
process. Every 15 minutes, for a five-hour period, a wheel is selected and the diameter
measured. Given are the diameters (in mm.) of the wheels.
Hour # mm. Mean Range
1. 23 24 26 28 25.3 5
2. 26 24 30 27 26.8 6
3. 24 32 26 27 27.3 8
4. 24 28 31 26 27.3 7
5. 25 24 25 27 25.3 3
Average 26.35 5.8
_
UCLX = 26.35 + 0.729(5.8) =30.58
LCLX = 26.35 − 0.729(5.8) =22.12
UCLR = (2.282)(5.8) = 13.24
LCLR = (2.282) (0) = 0
# A restaurant is interested in detecting changes in the number of minutes from a party’s

sitting down to getting the bill.
Sample Quality Variable Mean Range
1. 23 28 21 24.0 7
2. 33 29 30 30.7 4
3. 25 27 25 25.0 2
4. 28 30 29 29.0 1
5. 29 28 28 28.3 1
6. 23 24 28 25.0 5
Average 27.1 3.5
_
UCLX = 27.1 + 1.02(3.5) = 30.67
LCLX = 27.1 − 1.02(3.5) = 23.53
UCLR = (2.575)(3.5) = 9.0125
LCLR = (3.5) (0) = 0
40
Example p- chart
# 20 samples of 100 pairs of jeans
Sample Defective Proportion Defective

1. 6 0.06
2. 0 0.00
3. 4 0.04
20. 18 18
200
_
P = Total defectives_______ = 200 = 0.10
Total sample observation 20(100)
_ _ _
UCLP = P + 3 P(1 - P) = 0.10 + 3 0.10(1 − 0.10) = 0.190
n 100
_ _ _
LCLP = P − 3 P(1 - P) = 0.10 − 3 0.10(1 − 0.10) = 0.010
n 100
# A manufacturer of running shoes wants to establish control limits for the percent
defective. Ten samples of 400 shoes revealed the mean percent defective was 8.0%. Where
should the manufacturer set the control limit?
_ _ _
UCLP = P + 3 P(1 - P) = 0.08 + 3 0.08(1 − 0.08) = 0.121
n 400
_ _ _
UCLP = P + 3 P(1 - P) = 0.08 − 3 0.08(1 − 0.08) = 0.039
n 400
# A restaurant is interested in detecting changes in the percentage of parties leaving less

than a 10% tip.
Sample Result of Inspection p
1. 2 no.,38 yes 0.05
2. 1 no.,39 yes 0.025
3. 0 no., 40 yes 0.0
4. 4 no., 36 yes 0.10
5. 3 no., 37 yes 0.075
6. 2 no., 38 yes 0.05
_
P = 12 = 0.05 , σP = 0.05 × 0.95 = 0.034
6(40) 40
41
Example c-chart
# Count of defects in 15 rolls of Denim fabric
Sample Defects
1. 12
2. 8
3. 16
15 15
190
_
Process average c = Total no. of defects = 190 = 12.67
Total no. of sample 15
_
Sample standard deviation σc = c
_ _ _
UCLc = c + Zσc = c + 3 c = 12.67 + 3 √ 12.67 = 23.35
_ _ _
LCLc = c + Zσc = c − 3 c = 12.67 − 3 √ 12.67 = 1.99
# A manufacturer of computer circuit boards tested 10 after they were manufactured. The
number of defects obtained per circuit board were 5, 3, 4, 0, 2, 2, 1, 4, 3 and 2.
Construct the appropriate control limits.
_
Process average c = 26 = 2.6
10 _
Sample standard deviation σc = c = √ 2.6
_ _ _
UCLc = c + Zσc = c + 3 c = 2.6 + 3 √ 2.6 = 7.44
_ _ _
LCLc = c + Zσc = c − 3 c = 2.6 − 3 √ 2.6 = -2.66
# A restaurant is interested in detecting changes in the number of parties per day that are
larger than 6 people.
Day No.
1 4
2 2
3 5
4 3
5 4
6 5
_
Process average c = 23/6 = 3.83 ,
UCLc = 3.83 + 3 √ 3.83 = 9.68
LCLc = 3.83 − 3 √ 3.83 = -2.08 > 0
42
Process capability:
Control limits: -The limits on a control chart used to evaluate the variations in quality
from subgroup to subgroup (Non be confused with speciation limits).
Tolerance: – The permissible variation in the size of quality characteristic.

The different between specifications is called the tolerance.
Process capability: The spread of the process. It is equal to six standard deviations when
the process is in a state of statistical control.
Process capability: The spread of the process. It is equal to six-standard deviation when
the process is in a sate of statistical control.
Procedure for process capability:

1. Take 25 subgroups of size 5 for a total of 100 measurements.
2. Calculate the range, R for each subgroup.
_
3. Calculate the average range R = ∑ R/25
4. Calculate the estimated of the population S.D.
_
σ = R/d2
5. Process capacity will equal 6σ ratios.
Process capacity ratio
Cp = Tolerance range
Process range
= Upper specification – lower specification
6σ
Where Cp = Capability index.
6σ0 = Process capability.
Case-I. If the capability index is 1.00 which is desirable

6σ
LSL 6σ USL
CP = USL − LSL = 6σ = 1.00

6σ 6σ
43
Case-II. If the capability index is greater than 1.00 which is desirable
6σ
LSL USL
8σ
CP = USL − LSL = 8σ = 1.33

6σ 6σ
Case-III. If the capability index is less than 1.00 which is desirable

6σ
LSL 4σ USL
Process capability index:

= =
CPK = Min {(upper specification limit −X) or ( X−lower specification limit)}
_ _ 3σ
i.e. = Min USL − X or X − USL
3σ 3σ
Interpretation of index values:

Case-I. If CPK =1, then the natural control limits and customer specification are exactly
equal. The process is just capable.
Case-II. If CPK >1, the process is highly capable of meeting customer specification.
Case-III. If CPK <1, the process is not capable.
Note: - If the process is not under control, then CPK has no meaning.
44
Calculation of process capability (CPK):
1. Take a lot size of 25 pcs.
2. Measure dimensions of all the pcs.

Says X1, X3, X3, …………………. X25
3. Take a sample of 5 pcs.
4. Find Rang of each sample.

i.e. X1, X3, X3, …………………. X5 = R1
X1, X3, X3, …………………. X10 = R2
X1, X3, X3, …………………. X15 = R3
X1, X3, X3, …………………. X20 = R4
_ X1, X3, X3, …………………. X25 = R5
5. Calculate R as per following formula.
_
R = R1 + R2 + R3 + R4 + R5
_ 5
6. Calculate. σ = R , d2 = 2.326 for rang of 5 pcs.
d2
7. Cp = Tolerance
d2 =
8. Upper CPK = (Upper specification limit −X)
= 3σ
Lower CPK = ( X−lower specification limit)
3σ
9. Process capability (CPK) = Lower of upper
Lower CPK
Process capability for Qualitative (CPK) = u (1 − p)

3
Where p is the estimated share of nonconforming units and u is the quantile function of the
normal distribution.
This formula typically produces the same value for CPK as with normally distributed
characteristic with the same fraction of nonconforming units (single-side).
Tools and Techniques:
I. Statistical process Control (SPC):

Seven tools ⇒ Pareto diagram, cause and effect diagram, check sheets, process flow
diagram. Scatter diagram, Histogram and control charges stratification.
II. Failure mode and effect analysis (FMEA)
III. Quality Function development (QFD)
IV. Measurement System Analysis.
45
Statistical process control (SPC):
SPC is comprised of seven tools. Pareto diagram, Cause and effect diagram, Check sheets,
Process flow diagram, Scatter diagram, Histogram & control charts and Stratification.
1. Pareto Diagram:
Alfredo Pareto (1848-1923) conducted extensive studies of the distribution of wealth in
Europe. He found that there were a few people with a lot of money and many people with
little money. The unequal distributions of wealth become an integral part of economy theory.
Dr.Joseph Juran recognized this concept as a universal that could be applied to may filed. He
coined the phrases “vital few and useful many”.
Types of field failure
Construction of a Parato diagram is every simple There are steps:
• Determine the method of classifying the data, by problem, cause, type of
non-conformity, and so forth.
• Ranks data classification in descending order from left to right.
• Decide of dollars (belt), weighted frequency or frequency is to be used to rank the
characteristics.
• Collect data for an appropriate time intervals.
• Summarize the data and rank order categories from largest to smallest.
• Compute the cumulative percentage if it is to be used.
• Construct the diagram and find the vital few.
Frequency
Percent
0
F C A E B D O
Types of Field Failures
Pareto diagram are used to identify the most important problems, Usually, 80% of total
results from 20% of the items.
The Pareto diagram is a powerful quality-improvement tool. It is applicable to problem
identification and measurement of progress.
46
2. Cause and effect diagram (Why – Why Analyze):
A cause and effect (C&E) diagram is a picture composed of lines and symbols designed
to represent a meaningful relationship between an effect and it causes. It was developed by
Dr.Kaoru Ishikawa in 1943 and also called as on Ishikwara diagram.
C&E diagram are used to investigate either a “bad” effect and to take action to correct the
caused for “good” effect and to learn those cause responsible. The figure shoes the C&E
diagram with the effect on the right and causes on the left. The effects the quality
characteristics that need improvement, Causes are usually broken down into the major
causes of man, machine, material, measurement, work method and environment.
Management and maintenance are also sometimes used for major cause is further
subdivided into numerous minor cause. For example, under work methods, we might have
training, knowledge, ability, physical characteristic, and so forth. C&E diagram also called.
“Fish bone diagrams” because of their shape of the complete structure.
Man Machine Material
Quality
characteristic
Environment Work Methods Measurement

Cause Effect
The first step in the construction of a C&E diagram is for the project team to identify the
effect or quality problem. It is placed on the right side of a large piece of paper by the team
leader. Next, the major causes are identified and placed on the diagram.
Determining all the minor causes requires transforming by the project team. Brainstorming
is an idea - generating a technique that is well - suited to the C&E diagram. It uses the
creative thinking capacity of the team.
47
3. Check sheets:
The main purpose of check sheet is to ensure that the data is collected carefully and accurately by
operation personnel or process control and problem soling. Data should be presented in such a form
that it can be quickly and easily used and analyzed.
Product: XYZ Date: Jan. 21
Stage: Final inspection Id: Paint
Number inspected: xxx Inspector / operator: ABC
Nonconforming Type Check Total
Blister 21
Light spray 38
Drips 29
Over spray 11
Splatter 08
Run 47
Others 12
Total 159
Number Nonconforming 113
Check sheet for paint nonconformities. The figure shows a check sheet for paint non-confirming for bicycles.
Hot Tub Mon Tue Wed Thu Fri Sat Sun
Chemical Test (Add if needed) D 7.4
PH/Chlorine
Temperature D 810°
Add water (if needed) D
Clean Deck around hot tub D √
Pool
Chemical Test (Add if needed) D 7.6
Add water (If needed) D 300
Check Temperature D 780
Vacuum pool (if needed) A
Filter back wash (20lb.) A √
Lint Filter D √
Sweep and Hose off Deck D √
General Cleaning
Vacuum Carpets D √
Vacuum and sweep building B D √
Clean Tables D √
Sweep and mop wooden deck D √
Clean outside deck, bring in chair
Take out trash D √
Empty building B Trash cons. D √
Wash windows D
√
Bathrooms
Scrub sinks, toilets and showers D √
Sweep and mop floors D √
Empty trash and check lockers D √
Cover Hot Tub (at end of the night) D √
Check pool fitters – be sure it is on D √
D=daily, A = As needed
List any and all deviation from this work schedule on observes side, date it and initial it.
Check sheet for swimming pool.
48
4. Process flow diagram:
It is a schematic diagram that shows the flow of the product or service as it moves through
the various processing operations. The diagram makes it easy to visualize the entire system,
identify potential trouble sport, and locate control activity.
Many standard symbols are used by Engineers and Scientifics. The common symbols and
their significance given below:
An ellipse Start or the end of the process.
A rectangle A step or a task in the process
A diamond A decision point.
Arrow To shows the direction of flow from

one activity to the next one in a sequence.
The diagram shows who is the next customer in the process¸ thereby increasing the
understanding of the process. Flow diagrams are best constructed by a team, because it is
rare for one individual to understand the entire process.
Improvements to the process can be accomplished by eliminating steps, combining steps,
or making frequently occurring steps more efficient.
49
Recruitments of supervisor
Start
Sort applications and

short-list for interview
First interview to select

The best five candidate
Second interview to
select
Final interview and

medical check-up
Call the next No Candidate approved?

Candidate
Yes
Negotiable terms and

Prepare offer letters.
Make offer after

receiving approval.
End.
50
5. Scatter Diagram:
A tool to study the cause and effect relationship between two variables is known as scatter
diagram.
The figure shows the relationship between automotive speed and gas mileage. The figure
shows that as speed increases gases mileage decreases. Automotive speed is plotted on the
x-axis and is the independent. Variable. The independent variable is usually controllable.
Gas mileage is on the y-axis and is the dependent, or a response, variable.
The relationship or correlation between the two variables can be evaluated. Figure shows
different patterns and their interpretation.
At (a), we have a positive correlation between the two variables because as x increases,
y increase.
At (b), there is a negative correlation between the two variables because as x increase,
y decreases,
At (c), there is no correlation, and this pattern is sometimes referred to as a shotgun pattern.
At (d), there may or may not be a relationship between the two variables. There appears to
be a negative relationship between x and y, but it is not too strong. Further statistical
analysis is needed to evaluate this pattern.
At (e), we have stratified the data to represent different causes for the same effect. One
cause is plotted with a small solid circle, and the other cause is plotted with a solid circle,
and the other cause is plotted with a solid triangle. When the data are separated, we see that
there is a strong correction.
At (f), we have a curvilinear relationship rather than a linear one.
51
FAILURE MODE AND EFFECT ANALYSIS (FMEA):
Failure mode and effect analysis (FMEA) is an analytical techniques (a paper test) that
combines the technology and experience of people in identifying foreseeable failure modes
of a product, service, or process and planning for its elimination. FMEA can be explained
as a group of activates indented to.
• Recognize and evaluate the potential failure of product, service, or process and its
effects.
• Identify actions that could eliminate or reduce the chance of the potential failure
occurring.
• Document the process.
FMEA is a “before-the-event” action requiring a team effort to alleviate most easily and
inexpensively changes in design and production. There are two types of FMEA: Design
FMEA and process FMEA.
FMEA Principle.
The FMEA is a formal and systematic method to analyze and eliminate potential failure
cause in the design and manufacturing phase. FMEA should be applied as early as possible
in the design process and definitely before starting the manufacturing process.
The FMEA is a relatively simple multi-step process consisting of the following tasks:
1. List all reasonably possible failure, deficiencies, omissions, unintended influences, etc.
systematically.
2. Evaluate their effects and potential impact on the product, process or customers.
3. Classify the severity or importance of the effect.
4. Identify causes of the potential failures, etc.
5. Estimate the probability of occurrence of the failure, etc.
6. Perform an evaluation of the product specification and/ or process monitoring with
regard to failure detection and avoidance.
7. Evaluate the probability of the failure detection.
8. Calculate the risk priority figure (RPF).
9. Based on these results, define sound proposals relative to design, manufacturing and/or
inspection and testing.
10. Assign the responsibilities, targets and completion dates for those changes.
11. Re-iterate the FMEA based on the changes and calculate the new risk priority figure.
Risk Priority Figure (RPF):

The result of the FMEA is the risk priority figure. It is calculate based on three factors
according to the following formula.
RPF = Severity of X Probability of Failure X Probability of Failure

Failure Occurrence Detection
The value points of those three factors are contained in the following table.
52
STANDARD SEVERITY RATINGS:
RATING DEGREE OF SEVERITY

1. Customer will not notice the adverse effect or it is in significant.
2. Customer will probably experience slight annoyance.
3. Customer will experience annoyance due to the slight degradation of
performance.
4. Customer dissatisfaction due to reduce performance.
5. Customer is made uncomfortable or their productivity is reduced by the
continued degradation of the effect.
6. Warranty, repair or significant manufacturing or assembly complaint.
7. High degree of customer dissatisfaction due to component failure
without complete loss of function. Productivity impacted by scrap or
rework levels.
8. Very high degree of dissatisfaction due to the loss of function without a
negative impact on safety or governmental regulation.
9. Customer endangered due to the adverse effect on safe system
performance with waiting before failure or violation of government
regulation.
10. Customer endangered due to the adverse effect on safe system
performance without earning before failure or violation of
governmental regulation.
OCCURRENCE RATINGS KNOWN CAPABILITY:
Numerical OCCURRENCE
Ranking Likelihood
1. 1 in 106
(CPK > 1.67)
2. 1 in 20,000
(CPK = 1.33)
3. 1 in 5,000
(CPK ~ 1.00)
4. 1 in 2,000
(CPK < 1.00)
5. 1 in 500
6. 1 in 100
7. 1 in 50
8. 1 in 20
9. 1 in 10
10. 1 in 2
53
DETECTION RATINGS KNOWN CAPABILITY:
Numeric Ranking Occurrence Like hood Detection Certainty
1. 1 in 106 CPK > 1.67 100%
2. 1 in 20000 CPK = 1.33 99%
3. 1 in 5000 CPK ∼1.00 95%
4. 1 in 2000 CPK < 1.00 90%
5. 1 in 500 85%
6. 1 in 100 80%
7. 1 in 50 70%
8. 1 in 20 60%
9. 1 in 10 50%
10. 1 in 2 < 50%
DETECTION RATINGS CAPABILITY UNKNOWN:
RATING ABILITY TO DETECT
1. Sure that the potential failure will be found or prevented before reaching
the next customer.
2. Almost certain that the potential failure will be found on prevered before
reaching the next customer.
3. Low likelihood that the potential failure will reach the next customer
undetected.
4. Controls may detect or prevent the next customer undetected.
5. Moderate likelihood that the potential failure will reach the next customer.
6. Controls are unlikely to detect or prevent the potential failure from
reaching the next customer.
7. Poor likelihood that the potential failure will be detected or prevented
before reaching the next customer.
8. Very poor likelihood that the potential failure will be detected or prevented
before reaching the next customer.
9. Current controls probably will not even detect the potential failure.
10 Absolute certainty that the current control will not detected the potential
failure.
Quality function deployment (QFD):

QFD is a system that identifies and sets the priorities for product, service and process improvement
opportunities that lead to increase customers satisfaction. It ensures the accurate deployment of the
“voice of the customer” throughout the organization from product planning to field service.
The QFD process answers the following questions:
1. What do customers wants?
2. Are all wants equally important?
3. Will delivering perceived needs yield a competitive advantage?
4. How can we change the product, service or process?
5. How does an engineering decision affect customer perception?
6. How does an engineering change affect other technical description?
7. What is the relationship to parts development process planning and production planning?
QFD products start-up costs, reduced engineering design, changes and most important, leads to
increased customers satisfaction.
54
Measurement system analysis (MSA):
SPC requires accurate and precise data, however, all data have measurement errors. Thus, a
observed value, has two components:
Observed value = True value + Measurement error
And also variation occurs due to other process and the measurement, thus
Total variation = Product variation + Measurement
Measurement variation is divided into repeatability, and reproducibility.
Repeatability: which is due to equipment variation.
Reproducibility: which is due to appraiser (inspector),
Variation: It is called Gage Repeatability (GR) and Reproducibility.
Data Collection:
The number of parts, appraisers, or trails can vary but 10 parts two or three appraiser, and
two three trials are considered optimum,
Calculations:
While the order of taking measurements is random, the calculations are performed by part
and appraiser. Calculations are as follows.
1. The average and range are calculated for each part by an appraiser.
2. The values in step 1 are averaged to obtain:
_ _ _ = = =
Ra, Rb, Rc, Xa, Xb, Xc
3. The value in step 2 are used to obtained:
_ = = = =
R and XDiff. Where XDiff. = XMax.− XMin.
4. The UCL and LCL for the range are determined.
= =
UCLR = D4 R , LCLR = D3R
Where D3 and D4 are obtained from table for subgroup sizes of 2 or 3.
Any range value (Ra, Rb or Rc) that is out of control should be discarded and the above
calculations repeated where appropriate, or the readings should be retaken for that appraiser
and part and the above calculations repeated where appropriate.
=
5. Determine X for each part, and from this information, calculate the range.
= =
Rp. = XMax.− XMin.
Analysis of Results
=
1. Repeatability EV = k1R
Where EV = Equipment variation (repeatability)
k1 = 4.56 for 2 appraisers and 3.05 for 3 trials.
55
2. Reproducibility =
AV = (k2 XDiff)2 − (EV2/nr)
Where AV = Appraiser variation (reproducibility)

K2 = 3.65 for 2 appraisers and
= 2.70 for 3 appraisers
n = number of parts
r = number of trial.
If a negative value occurs under the square root sign, the AV value defaults to zero.
3. Repeatability and Reproducibility
R & R = EV2 + AV2

Where R & R = Repeatability and Reproducibility.
4. Part variation PV = j Rp
Where PV = Part variation.
Rp = range of the part averages.
j = dependent on number of parts.
Part 2 3 4 5 6 7 8 9 10
j 3.65 2.70 2.30 2.08 1.93 1.82 1.74 1.67 1.62
5. Total variation
TV = (R&R)2 + PV2
Where TV = Total variation.
The percent of total variation is calculated using the equations below.
%EV = 100 (EV/TV)
%AV = 100 (AV/TV)
%R&R = 100 (R&RV/TV)
%PV = 100 (PV/TV)
Evaluation
It repeatability is large compared to reproducibility, the reasons may be
1. The gage needs maintenance.
2. The gage should be designed to be more rigid.
3. The clamping or location for gauging needs to be improved.
4. There is excessive within – part variation.
If reproducibility is large compared to repeatability the reasons may be
1. The operations needs to be better trained are how to use and read the gage.
2. Calibrations on the gage are not legible.
3. A fixture may be needed to help the operator use the gage consistently.
Guidelines for acceptance GR&R (% R & R) are:

Under 10% error – Gage system is satisfactory.
10% to 30% errors – May be acceptable based upon importance of application, lost of gage,
cost of repairs etc.
Over 30% error – Gage system is not satisfactory.
Identify the causes and take corrective action.
56
Example:
A log of length specification 7.0 + 2.5 is out from bigger logs. Data collected is as follows:
Sample A-Inspector B-Inspector

T1 T2 T3 Avg. Range T1 T2 T3 Avg. Range
1. 7.3 7.2 7.2 7.23 0.1 7.0 6.9 7.2 7.03 0.9
2. 6.8 6.9 7.1 6.93 0.3 7.1 7.1 6.9 7.03 0.2
3. 7.2 7.2 7.0 7.13 0.2 7.0 7.1 7.0 7.03 0.1
4. 7.1 7.3 7.1 7.17 0.2 7.0 7.0 7.1 7.03 0.1
5. 6.8 6.9 7.1 6.93 0.3 6.7 6.9 6.9 6.83 0.2
_ _ = = =
XA = 7.08 , XB = 6.99 , XDiff. = XMax − XMin = 0.09
_ _ = =
RA = 0.22 , RB = 0.18 , R =0.002 , UCLR = D4R = 0.51
=
Equipment variation (EV) = k1R = 0.2 × 3.05 = 0.61
=
Appraiser variation (AV) = (k2 XDiff)2 − (EV2/nr)
AV = (0.09 × 3.65)2 − (0.612/3×5) = 0.29
Where n = number of parts/ sample, r = number of trials.
Trials 2 3 Observer 2 3
k1 4.56 3.05 k2 3.65 2.70
Total R &R = EV2 + AV2
Total R &R = (0.61)2 + (0.29)2 = 0.29
EV% = EV × 100____ = 0.61 × 100 = 12.2%

Total tolerance 5
AV% = AV × 100___ = 0.29 × 100 = 5.8%

Total tolerance 5
EV% = R&R × 100_ = 0.68 × 100 = 13.6%

Tolerance 5
Equipment variation more than 10%
Thus, on basis of R&R we do calibration or replace.
57
Linearity:
Reference value 2.00 4.00 6.00
1 2.10 4.00 5.7
2 2.5 4.1 5.6
3 2.8 4.1 7.2
4 3.0 4.1 7.8
5 1.8 4.5 6.2
6 1.9 3.8 6.2
7 3.2 3.8 6.5
8 1.7 4.0 5.2
9 2.5 3.5 5.5
10 1.5 3.7 6.0
Range 1.7 1.0 2.6
Y
To fit a straight line Y = A + BX Linearity
Normal equations are: ∑Y = nA + B∑X
∑XY = A∑X + B∑X2
5.3 = 10A + 12B
23 = 0.4 & A = 0.05
B = 0.05 = bias.
A = 0.4 = linearity Bias
X
Yi = A + BXi + Error
10
δL = 2 [ ∑ ( Yi−A−BXi)] = 0
i=1
δA
⇒ ∑Yi = nA + B∑Xi ()
δL = i [ ∑ ( Yi−A−BXi)Xi] = 0
δA
⇒ ∑XiYi = A∑Xi + B∑Xi2
Stability:
Jan Feb Mar Apr May
7.0 9.0 10.0 11.0 9.0
8.5 10.0 10.5 11.5 6.5
9.0 11.0 9.5 8.2 6.2
6.5 10.5 10.5 7.5 9.2
5.0 10.4 11.0 6.9 8.5
8.7 9.5 11.5 6.5 8.2
10.0 9.8 10.5 6.0 7.0
10.5 10.2 10.2 7.0 7.0
8.0 10.1 10.5 7.5 7.0
7.5 10.0 10.7 7.0 6.2
Avg. 8.07 10.05 10.49 7.91 7.48
Range 5.5 2.0 2.0 5.50 2.8
58
_
R = 17.8 / 5 = 3.56
_
UCL R = D 4R = 1.777 ×3.56 = 6.32612.
_
LCL R = D 3R = 0.223 ×3.56 = 0.79388
=
X = 8.8
= _
UCL X = X + A 2R = 8.8 + 0.308 × 3.56 = 9.89648
LCL X = 8.8 − 0.308 × 3.56 = 7.70352
These calculations help to comment that the data is not stable with respect to setting of the
process.
This say setting of Machine.
Part Number
Appraiser A 1 2 3 4 5
Trial 1 0.34 0.50 0.42 0.44 0.26
Trial 2 0.42 0.56 0.46 0.48 0.30
Trial 3 0.38 0.48 0.40 0.38 0.28
_ 0.38 0.51 0.43 0.43 0.28
X
R 0.08 0.08 0.06 0.10 0.04
Appraiser B
Trial 1 0.28 0.54 0.38 0.46 0.30
Trial 2 0.32 0.48 0.42 0.44 0.28
Trial 3 0.24 0.44 0.34 0.40 0.36
_ 0.28 0.49 0.38 0.43 0.31
X
R 0.08 0.10 0.08 0.06 0.08
=
Xa = (0.38 + 0.51 + 0.43 + 0.43 + 0.28)/5 = 0.41
=
Xb = (0.28 + 0.49 + 0.38 + 0.43 + 0.31)/5 = 0.38
=
Ra = (0.08 + 0.08 + 0.06 + 0.10 + 0.04)/5 = 0.07
=
Rb = (0.08 + 0.10 + 0.08 + 0.60 + 0.08)/5 = 0.08
=
X = 0.41 − 0.38 = 0.03
=
R = (0.07 + 0.08)/2 = 0.08
UCLR = 2.574 × 0.08 =0.21, LCLR = 0
59
None of the range values are out of control.
=
X1 = (0.38 + 0.28)/2 = 0.33
=
X2 = (0.51 + 0.49)/2 = 0.50
=
X3 = (0.43 + 0.38)/2 = 0.41
=
X4 = (0.43 + 0.43)/2 = 0.43
=
X5 = (0.28 + 0.31)/2 = 0.30
Rp = 0.50 − 30 = 0.20
EV = 3.05 × 0.08 = 0.24
AV = (3.65 × 0.03)2 − (0.242/5×3) = 0.09
R &R = (0.24)2 + (0.09)2 = 0.26
PV = 2.08 × 0.20 = 0.42
TV = (0.26)2 + (0.42)2 = 0.49
%EV = 49%, %AV = 18%,

%R&R = 53%, %PV = 86%
The Gage system is not satisfactory. The equipment variation in (repeatability) is quite
large is relation to the appraiser variation (reproducibility).
Regression analysis
Relationship among variables.
In scientific research and industrial problem soloing often a situation is encountered where
in a number of variables are involve with possible interactions or relationship among
themselves. Regression analysis is a statistical technique for investigating and modeling
functional relationship among these variables in such situations. As an example, consider
the family income and age at marriage of the girl. One may be interested to find out
whether they are related and if so, what is the form of relationship.
The relationship may be expressed in the form of an equation or model connecting one of
the variables known as response or dependant variable with one or more other variables
known as the response or the dependant variables with one or more other variable know as
explanatory or predictor or independent variables. Applications of regression analysis are
numerous and occur almost every filed, including engineering, quality control, physical
sciences, economics management, life and biological sciences, social sciences etc.
The simplest case of the regression analysis is the one where there are only two variables,
one dependent variable and one independent variable, and the relationship between them is
60
linear. This is known as simple linear regression. When there are more than one
independent variable and the relationship considered is linear we have what is known as
multiple regression. When the relationship is not liner we may have to consider a nonlinear
model like polynomial regression model, multiplicative model etc. Regression analysis
may be carried out for various purpose like (a) summarize / describe data in multiple
variable set, to determine the levels of the process parameters which optimizes the yield or
any other response of interest, for prediction and estimation purposes etc
Steps in Regression Analysis.

Regression analysis include the following steps:
1. Statement of the problem.
2. Selection of potentially relevant variables.
3. Data collection.
4. Graphics representation of the data (scatter plot)
5. Model specifications.
6. Choice of fitting method.
7. Model fitting and calculation of indices like correlation coefficient etc.
8. Model validation and criticism.
9. Using the chosen model (s) for the solution of the posed problem.
The variables can be either quantitative or qualitative. Examples of the quantitative

variables are measurable variable like hardness, tensile strength, height, age at birth of the
first child etc. Examples of qualitative variables are good / bad, defective / non-defective,
religion, sex, region etc.
Graphical Representation of the data:

If there is only one predictor variable then the data can be plotted as a scatter diagram to
get an idea about the type of relationship, especially about the linearity of the relationship.
This kind of graphical representation of the data will help to from ideas about the
appropriate model to be chosen.
Hardness (X) and Tensile strength (Y) of 16 specimens of annealed steel.
S.No. X Y S.No. X Y
1 144 70.00 11 163 81.10
2 171 85.15 12 150 71.10
3 164 83.50 13 175 85.40
4 155 72.90 14 166 78.84
5 180 85.00 15 158 80.80
6 167 77.25 16 168 80.60
7 165 83.60 17 160 79.85
8 169 82.25 18 188 93.15
9 150 76.35 19 171 79.60
10 155 76.20 20 179 81.65
61
The scatter plot can indicate that
1. There is a linear relationship between X and Y, where Y increases with X.
2. There is a linear relationship between X and Y, where Y decreases with X.
3. There is no relationship between X and Y.
4. X and Y are related but the relationship between them is nonlinear.
A regression equation containing only on predictor variable is called a simple regression

equation where as if there are more than one predictor variable the equation is known as a
multiple regression equation. Often the actual relationship may be non – linear for the
wider range of the predictor variables but it can be considered to be linear in the range of
the predictor variables. We are interested.
Method of fittings:
After the model has been defined and the data have been collected, the next task is to
estimate the parameter estimation or model fitting. The most commonly used method of
estimation is called the least squares method. Others are the maximum likelihood method,
the ridge method and the principal component method.
Simple Linear Regression:

In simple linear regression we have only one independent variable and one dependent
variable. Let these be denoted by X and Y respectively. Further the relationship is assumed
to be linear. Thus, the relationship here can be expressed as a linear equation of the form.
y = a + bx + ε
Where a and b are unknown constants and ε is a random error component. The parameter a
is the intercept of the regression line and b is the slope of the line. The parameter a and b
are usually called regression coefficients. The errors are assumed to have mean zero and
unknown variance σ2. Additionally, we usual assume that the errors are uncorrelated. This
means that the value of one error does not depend on the value of any other error.
It is convenient to view the regressor X as controlled by the data analyst and measured with
negligible error, while the response Y is a random variable. That is there is probability
distribution (usually normal) for Y at each possible value of X.
Correlation coefficient:
Correlation coefficient denoted by r (or rXY), measures the degree of linear association ship
between two variables. It is calculated as:
SXY
r =
SXX SYY
n _ _
Where SXY = ∑ (yi − y)(Xi − X) = (n- 1) times covariance between X and Y
i=1
n _
SXX = ∑ (Xi − X)2 = (n- 1) times variance of X
i=1
n _
SXY = ∑ (yi − y)2 = (n- 1) times variance of Y
i=1
62
Fitting the best line: Least squares
For fitting the best line through the points (x1, y1), (x2, y2)………………………(xn, yn)
least squares method is adopted where in the squared deviations of the points from the
fitted line is minimized. That is,
n n
Minimise S = ∑ εi2 = ∑ (yi − a − bxi)2
i=1 i=1
To minimize the above, we differentiate with respect to x and y and equate to 0,obtaining
two equations, known as Normal Equations. On solving these two equations, the values of
a and b are obtained as:
_ _
a = y − b X and b = Sxy /Sxx
where, Sxy , Sxx are as defined earlier.
The equation so established, is know as regression of y on x, can be used for predicting y
for giving values of x. However, this equation can’t be used for prediction for x forgiven
value of y. We can use the some data to fit a regression of x on y, which can be used for
prediction of x for given values y. When r = ±1, the regression of y on x can also be used
for prediction of x for given values of y.
Example
For the data given in the above table find out the least square estimates of the regression
parameters a and b.
_ _
We have, X = 165.90 y = 80.2125
Syy = 554.4744, Sxx = 2717.0197, Sxy = 1054.2041
b = Sxy = 1054.2041 = 0.388
Sxx 2717.0197
∧ _
a = y − β1X =80.2125 − 165.9 × 0.388 = 15.71
Multiple Regressions:
There are situations when one dependent variable may be related with more than one
independent variable. In such cases, we try to develop a model /equation relating the
dependent variable with the independent variable. Such regression models are known as
multiple regression analysis. If y be the dependent variable and x1, x2 and x3 be the
independent variable then the linear regression equation fitted may be of the form.
y = a + b1x1 + b2x3 + e
63

Data Science Training

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Data Science Training

Transféré par

Droits d'auteur :

Formats disponibles

Six Sigma Green Belt Training

Two Aspects of Quality

1. The External Aspect

2. The Internal Aspect

Quality Guru – Deming, Juran and Shewhart

We are in Business to Earn Profile

Reduction in cost is essential for survival

Six Sigma: Problem-by-Problem Approach.

Terminologies in Six Sigma

2. DPO = D /(U × O) = 34 / 750 × 10 = 0.0045

3. Yield = e (-DPU) = 2.7183(-0.045) = 0.956 = 95.6%

4. DPMO = DPO × 106 = 4500

5. Sigma Level = 2.611

Defect: A defect is a non-conformance on one of many possible quality characteristics of a

Process map, C&E,

Six Sigma Approach:

What is statistical thinking?

Deming Once Said

Relationship: Between satisfaction thinking and statistical methods.

Process → Variation → Data → Statistical Tools

Statistical Thinking Statistical Methods

The way we think

Where we’re Strategic Executives

Managerial process Managers

Where the work

Examples of operational processes

Examples of Strategically thinking at the operational level

• Work process are mapped and documented

Examples of Managerial process:

• Managers use meeting management techniques.

Examples of Strategic Processes

• Strategic plan development

Examples of Statistical Thinking at the Strategic Level

• Executives use system approach.

• Develop strategies that are insensitive to economic trends and cycles.

Control the process

Improve the system Quality

Process Robustness Analysis

A portion or subset of the population

• Assign a number to each member of population number table. Software program or

Primary Data Secondary Data

Observation Experimentation Survey Print or Electronic

Enumerating Study Analytical Study

Types of Data Data

Residence Male Female Total

(Income categories: US $25,000, $25,000 & over)

Income No Yes Total

Frequency Equal Bar width

A typical histogram show in the above fig,

A measure of central tendency of a distribution is a numerical value than described the

Temp.°C (X) No. of days (f) Xf

Position Point: ( n and n+1 )

Range Variance Standard and

Range = X largest – X smallest

• Ignore the way which data are distributed.

Sample standard deviation (S):

Same facts about standard deviation formula

Data 17, 16,2118,13,15,12,11.

S2 = (17-15.5)2 +(16-15.5)2 +…………….(11-15.5)2 = 11.14

Min value Q1 Q2 Q3 Max value

Ist IInd IIIrd

Arrange the data in increasing order i.e.

Q2 = Median = 17.9, Q1= 13.5 and Q3 = 23.9