Reliability Maturity

Reliability
Maturity
Understand and Improve Your
Reliability Engineering Program
Fred Schenkelberg
Reliability
Maturity
Understand and Improve Your
Reliability Engineering Program
Fred Schenkelberg
Los Gatos, California

2014
Copyright © 2014 Fred Schenkelberg
Licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives
4.0 International License.
http://creativecommons.org/licenses/by-nc-nd/4.0
Feel free to email, tweet, blog, and pass this ebook

around the web but please don’t alter any of its
contents when you do. Thanks!
If you find this work of value to you, consider

purchasing a copy and supporting the work.
If you have purchased a copy, Thank you!
FMS Reliability Publishing
15466 Los Gatos Blvd #109-371

Los Gatos, California 95032
fmsreliability.com/publishing/
Printed in the United States of America
ebook ISBN: 978-1-938122-04-0

paperback ISBN: 978-1-938122-05-7
Contents
Introduction1
Maturity Matrix 5
Exploring Reliability Culture 11
Three Ideas to Overcome Organization Inertia 17
Reactive and Proactive 25
Goals without Apportionment or Measures 33
Reliability Maturity Matrix Guide 37
Moving from Stage 1 to Stage 2 49
How to Assess Your Reliability Program 77
Sample Survey Questions & Support Material 87
Following up on the Survey 103
Book Conclusions and Summary 107
Glossary of Terms 111
References113
Introduction
Product reliability refers to how well a product performs over time.
The reliability performance is the direct result of decisions made and
actions taken during design, assembly, and use.
A well-designed product will meet reliability expectations. Likewise, a

weak design will suffer from more failures than expected.
The assembly process alone cannot improve product reliability. The

design of a product establishes the reliability performance potential
assuming it is assembled correctly.
In order to create a reliable product the design must consider the

expected use conditions. Even a simple product will experience
multiple stress throughout the products use.
Reliability occurs at the point of decision during the design

process. Decisions may or may not deliberately include reliability
considerations. The reliability culture or maturity of an organization
establishing the type and amount of reliability consideration each
decision receives.
Maturity refers to the behaviors within an organization. A mature

company is able to repeatedly create reliable products. An immature
company’s erratic processes may or may not create reliable products.
1
Reliability Maturity - Understand and Improve Your Reliability Program
Maturity reflects the culture or approach to reliability. Immature

organizations tend to ignore or use crude techniques to set
requirements, identify risks, or measure results. Mature organizations
proactively work across the organization to enable appropriate
decisions by using specific techniques fit for the task.
One simple example is the way two organizations use HALT (a testing
process to determine likely failure modes). The immature organization
does not know what HALT is nor understand HALT’s purpose or value.
The mature organization uses HALT when appropriate and cost
effective. Its employees know the what, how, and why for HALT.
A reliability program assessment is a tool used to determine the

current maturity stage of an organization. In most organizations
very few if any fully understand the entire set of decisions and actions
cumulating in the resulting product reliability performance.
By using a structured interview or survey approach with a cross

section of the organization, you can develop an understanding of
the overall reliability program. In this book you will learn about the
assessment process and how to conduct your own assessments.
With an assessment in hand, it’s time to create recommendations to

improve the organization’s reliability maturity. In part, this is what is
missing and inhibiting reliability maturity or what is working well and
2
Introduction
with reinforcement would improve further. In general, organizations

have one or more areas that are stronger or weaker (more or less
mature) yet tend to align with one stage of maturity.
The reliability maturity matrix is a framework to help you understand

the organization’s reliability culture and to make improvements to
that culture, as needed. Recommendations that move the organization
to the right on the maturity matrix should address all the key
reliability practice areas. This book includes specific steps to move an
organization from one block of the matrix to the block to the right.
This book’s intent is to provide you with practical and actionable

information so you can change the culture of your organization and
consistently create reliable products for your customers.
3
4
Maturity Matrix
The concept of a maturity model is not new. It provides a means to
identify the current state and illuminates the possible improvements
that can be made to a reliability program.
The reliability maturity matrix serves as a guide to assist an

organization in improving its program.
In general, the higher stages are most cost effective and efficient at
achieving optimal product reliability performance. There are five
stages.
Stage 1: Uncertainty
“We don’t know why we have problems with reliability.”
Reliability is rarely discussed or considered during design and

production.
Product returns resulting from failure are considered a part of doing

business.
5
Field failures are rarely investigated, and often blame is assigned to

customers.
The few people who consider reliability improvements gain little

support.
Reliability testing is done in an ad hoc fashion and often simply to meet

customer requirements or basic industry standards.
Stage 2: Awakening
“Is it absolutely necessary to always have problems with

reliability?”
Reliability is discussed by managers but not supported by funding or

training.
Some elements of a reliability program are implemented, yet generally

not in a coordinated fashion.
Some experimental use of tools such as FMEA and accelerated

and highly accelerated life testing, but most effort still focuses on
standards-based testing and meeting customer requirements.
6
Maturity Matrix
Some analysis is done to estimate reliability or understand field

failure rates, yet limited use is made of these data in making product
decisions.
There is, however, an increasing emphasis on understanding failures

and resolving them.
Failure analysis is typically accomplished by component vendors with

little result.
Stage 3: Enlightenment
“Through commitment and reliability improvement we are

identifying and resolving our problems.”
A robust reliability program exists and includes many tools and

processes.
Generally, significant effort is directed to resolving prototype and field

reliability issues. Increasing reliance is placed on root-cause analysis
to determine appropriate solutions.
7
Some tools are not used to their full potential owing to lack of
understanding of reliability and how the various tools apply.
Some reliance is placed on establishing standard testing and

procedures for all products. Only some use of these testing results is
made for estimating product reliability to supplement predictions.
Predictions are primarily made to address customer requests and not

as feedback to design teams.
Stage 4: Wisdom
“Failure prevention is a routine part of our operation.”
Each product program or project has a tailored reliability program

that can be adjusted as the understanding of product reliability risks
changes.
Reliability tools and tasks are selected and implemented because they
will provide needed information for decisions.
Testing focuses on either discovering failure mechanisms or

characterizing failure mechanisms.
8
Maturity Matrix
Testing often proceeds to failure, if possible.
Advanced data analysis tools employed regularly and reports

distributed widely.
There is increasing cooperation with key suppliers and vendors to

incorporate the appropriate reliability tools upstream.
Stage 5: Certainty
“We know why we do not have problems with reliability.”
Product reliability is a strategic business activity across the

organization.
There is widespread understanding and acceptance of design for

reliability and how it fits into the overall business.
Product reliability is accurately predicted prior to product launch using

a mix of appropriate techniques.
New materials, processes, and vendors are carefully considered for

their ability to meet internally established reliability requirements.
9
The few failures that do occur are expected and analysis is done to
identify early signs of material or process changes.
Customers and suppliers are regularly consulted on ways to improve

reliability.
Nature of Maturity
The stages of maturity may or may not proceed in a progression

within an organization. It is not like a plant that begins as a seed and
eventually matures.
An organization may start at stage 2, skipping stage 1 and never

progressing.
Some organizations do advance from lower stages to stage 5. They also

may regress to a lower stage over time.
It is with deliberate effort that an organization advances and maintains

one of the more mature stages. Once established in the culture higher
stages of maturity the self-sustaining nature of the stage will take little
effort to maintain.
10
Exploring Reliability Culture
Years ago I had the opportunity to assess the reliability programs of
two teams within the same organization. They made similar products
for different segments of the market, and the teams were about the
same size.
Two years previously, both teams had lost their staff reliability
professional.
Furthermore, both teams were located in one building, one upstairs

and the other downstairs, which made scheduling the assessment
interviews convenient.
Upstairs, Downstairs
Though the course of the interviews I enjoyed the conversations more

with the team upstairs. The interviews started on time and were not
interrupted, and I noticed that the office plants were common, green,
and healthy.
The engineers and managers knew how to use a wide range of

reliability tools to accomplish their tasks. For example, the electrical
11
design engineer knew about derating and accelerated life testing,

and she also knew about the goal and how it was apportioned to her
elements of the product.
Each person I talked to upstairs knew the overall objective and how
they provided and received information using a range of reliability tools
to make decisions.
They enjoyed a very low field failure rate and simply went about the
business of creating products.
Downstairs was different.
The interviews rarely started on time and most were interrupted by

an urgent request usually involving an emerging major field issue or
customer complaint. I didn’t see any office plants, just plenty of coffee
pots.
The engineers and managers knew that Phil, the former reliability
engineer with the team, did most of the reliability tasks. When I asked
about stress testing or risk assessment, the responses I got were “That
was Phil’s job” or “Phil used to do something like that.”
12
Most did not know what HALT or ALT was and didn’t have time to find
out.
There was a vague goal, but all agreed that because it wasn’t measured
during product development it was meaningless.
The downstairs team had a very high field failure rate and the design
team often spent 50% or more of its time addressing customer
complaints.
History
The only salient difference between the teams and their history was the
behavior of the former reliability professionals with each team.
Upstairs, Mabel was a reliability professional well versed in a wide

range of reliability tools and processes. She provided direct support
along with coaching and mentoring across the organization.
She encouraged every member of the team to learn and use the
appropriate tools to make decisions. The team became empowered to
make decisions that led to products meeting their reliability goals.
13
Downstairs, Phil was another reliability professional well versed in a

wide range of reliability tools and processes.
He directly supported the team by doing the derating calculations,

asking vendors for reliability estimates, designing and conducting
HALT or ALT as needed, and performing the myriad other tasks related
to creating a reliable product.
He provided input and recommendations for design changes that

would improve reliability, and he was a key member of the team.
Phil was not a coach or mentor, however, and as he moved to a new

role his knowledge and skills went with him. He preferred to just do
it himself and often found he had little time to teach others about
reliability engineering tasks.
The difference between these teams was in the culture.
This difference showed in who had and who used reliability engineering
knowledge. When the all team members have knowledge appropriate
for their role on the team, they can apply those tools to assist in making
design decisions.
14
Without that knowledge, design teams will use the tools and knowledge
they have to make design decisions. Without the consideration of
reliability-related information the design decisions are made blind to
the impact.
Take Away
Reliability occurs at decision points during the design process:
• when components are selected

• when structures are finalized
• or when all risks have been addressed.
Near the end of any product development process the team asks
whether the product is ‘good enough’ to start production and introduce
the product to the market.
Having a clear goal with appropriate measure of the current design’s

ability to meet that goal provides the reliability aspect of ‘good enough.’
Each organization or product is different.
15
The markets, expectations, and environments are all different. Yet,

every product achieves some level of product reliability.
The culture is only one factor, yet I suspect that in this case you would
agree that working upstairs would be preferable.
16
Three Ideas to Overcome Organization Inertia
Sometimes, it seems the forces of nature are working against our
ideas.
I recall being frustrated as a child playing in the sandbox. I

wanted to create a ramp of sand to race my cars down. No
matter how much I pushed and patted the dry sand succumbed
to some unseen force and did not hold the desired shape.
In business we sometime experience the same frustration. It’s not

gravity in this case: We are facing organizational inertia.
Organization Physics
Once a group of people settle into a routine way of accomplishing

something, it is not a simple matter to change the process. You may
have experienced this resistance.
Like the physics concept of inertia (recall that a body at rest tends
to remain at rest) people that are familiar with a ‘way’ something
currently happens tend to want it to stay that way.
17
Just as with a physical object on the frictionless plane, no amount of

cajoling, presentations, or commands will move the object.
Unlike the mass on the plane, we are not allowed to strike our fellow
workers with some force to change their state from resting to in
motion. This is generally frowned upon.
So, what can we do? We know change happens, we know our ideas
have merit, we know there is value in making improvements.
Improving a Reliability Program
Often, when an organization asks someone for a reliability program

assessment, what is really being asked is how to change the
organization and sometimes how to change the culture itself.
Sure, an assessment will result in recommendations for

improvements. However, those recommendations, no matter how
compelling and obvious, are of no value unless implemented.
That is where inertia comes back into play.
18
Overcoming Organizational Inertia
Here are a couple of tips that may help you implement reliability
improvements while overcoming organizational inertia.
• Work with key influencers.

• Make the current reality visible.
• Celebrate successes.
Every organization is different and every situation warrants its own

approach, yet these three tips may help you look for opportunities to
accelerate the implementation of your proposed changes.
Work with Key Influencers
Some people within an organization have the ability to sway many

others.
These people are the ones others look to for advice. They are the ‘go to’
people for a range of topics, including reliability, if you’re lucky. They
may or may not be managers.
19
Getting them on board may provide the credibility, support, and

influence you need to move forward.
Start by understanding what motivates these key people. If they want

the credit for the idea — give it to them. If they want only what’s best for
the company — show how improving reliability does so.
A couple of one–on-one meetings will determine whether or not you

have their support.
Change in the organization is easier to implement with their active

support. As in the sandbox, adding a little water to bind the sand
together would have helped in building a ramp.
In an organization there are those that provide the ‘binder’. Working

with key influencer may accelerate the implementation of your project.
Make the Current Reality Visible
Many team members claim that they understand product reliability

and that it is valuable to their customers, the company, and
shareholders.
20
Yet, few can tell you the cost of unreliability.
Make the cost of failure visible.
No one really likes to look at failures too closely, unless one is a failure
analyst. Counting profits and measuring sales volume are so much
more fun.
Product failures, although we all know they occur, are often overlooked
as a subject.
Track down and publish internally the warranty cost per unit sold
and total warranty expense. Then compare these numbers to the
cost of goods sold and net profit. You may find the cost of failure in
these terms to be useful for others to understand the magnitude of
opportunity that reducing product failure represents.
Besides, to make good decisions we need cost per failure type

information to balance the other information also provided in terms of
money, i.e., production costs, material costs, sales per day, etc.
Coupled with a clear plan to reduce the cost of failures, this process
may just garner enough attention to gain acceptance of your ideas.
21
Celebrate Successes
Somewhere in your organization are those who are doing the right
things already. Find them and help them gain the recognition they
need.
Tell stories about what they did and the difference it is making.
Highlight their work as an example of what can be done in our
organization.
As one or more people start to implement your ideas for reliability

program improvement, help them to be successful. Then celebrate
with them and herald the success across the organization.
This process resembles a grass-roots effort to organization change,

but with the added feature of promoting success as you go forward.
Getting Moving
In the sandbox year ago, I saw a friend use a bit water to

change the material to something that worked a bit better.
That idea sparked finding a wooden board to use instead. I
changed the material, thus finding a much better solution.
22
As you work to improve your reliability program, keep in mind that

you are working with people. Like sand, sometimes they need to find
support, sometimes they need to understand the goal, and sometimes
they need a little encouragement to firm up resolve.
Obviously, change happens. We can encourage change to improve

product reliability and share the benefits. There will be plenty of
benefits to go around.
23
24
Reactive and Proactive
Do you let events happen to you, or do events follow your designs and
expectations?
Are you a spectator or an actor?
Do you wonder about your products’ future or do you control that

future?
Are you reactive or proactive?
Every reliability and maintenance program is a system. Every program

has inputs, such as product testing results and field returns.
Every reliability program has outputs, such as product design and

production.
In the most basic terms, a reliability program includes product

specifications for functionality, including expected durability. The
program includes some form of design, verification, production, and
field performance.
25
Given this basic life-cycle description it is possible for two types of

approaches to evaluating the product life-cycle: reactive and proactive.
Every Design Will Fail
Let’s consider the notion that every product will eventually fail.
Even the most robust product on Earth will fail when the Sun expires.
Well before the collapse of the solar system most products made today
will have completely failed. The failures will range from deterioration
of materials, to stress conditions (e.g., lightning strikes), or simply to
misuse.
Some products will simply wear out; others will become obsolete and
lose compatibility with other systems; others will simply no longer
provide sufficient value.
Another important notion is that, with any product design, there are
a finite number of faults. A button has a limited number of actuation
cycles before accumulated stress cracks the switch dome.
26
Any given material has a degradation mechanism (corrosion, polymer

chain scission, etc.) that slowly deteriorates the material’s strength.
A ‘bug’ in the software can disable the equipment temporarily.
Further, there are possible design elements in the product for which
the designer failed to study how these would be affected by production
variation, user demand, or environment variations.
In every case, sooner or later, the design flaw will lead to failure.
Nonetheless, given only a finite number of failures, it is possible to find

and remove most design errors.
Reactive Approach
The most common approach to product reliability is to wait for product

failures and then respond with analysis, adjustments, and refinements
in an attempt to improve product reliability.
The naive wait for the failure reports from customers before taking
action. The team’s logic, if even considered, is the following:
27
• We are good designers.

• The customer will use the product in unforeseen environments and
applications.
• If there are customer failures we will consider improvements.
For some products, with limited release and ample time to redesign the
product, this may be perfectly feasible.
A simple improvement the design team could consider is an estimate

of the customer’s use profile and environmental conditions.
Armed with this information, the team then evaluates the impact of
the conditions on the product’s reliability though standardized testing.
Setting testing conditions at or slightly above expected operating
environments enables direct evaluation of the design to meet expected
conditions.
The faults found would be similar to the failure expected to occur in

the customer’s hands, and there may be time for a redesign before the
product is shipped to customers.
Carrying out this logic may lead to a broad spectrum of testing that is
both expensive and time consuming.
28
Part of the logic of product testing includes the thought, “If we test in
enough ways over the full range of use and environmental conditions,
we should find and correct every design fault.”
There is often a heavy reliance on industry standards and common test

methods for every product.
Further improvements to product reliability can refine this reactive

method; these include using simulations, performing risk analysis,
and undertaking early evaluation and testing of subsystems and
components.
The overall approach is often limited by knowledge of actual use

conditions, lack of test samples, and lack of time.
Proactive Approach
Moving to a proactive approach can reduce the amount of product

testing and increase product reliability.
Although this may seem similar to the reactive approach, it involves a

focus on failure mechanisms instead of test methods.
29
Products fail because they do not have sufficient strength to withstand

a single application of high stress (being dropped, being exposed to a
static discharge, etc.) or they accumulate damage (e.g., from wear,
corrosion, or drift) with use or over time.
By thinking though how a product could fail by considering the

materials, design, assembly process, and the same for vendor-
supplied elements, the product team determines a list of possible
failure mechanisms.
In this approach not all the failure mechanisms will be fully understood
or characterized.
The risk in this case lies in the decision to launch the product while not
understanding the possibility or potential magnitude of product failure
The amount of risk itself is unknown.
Therefore, the proactive team proceeds to characterize the design or

material under the expected use conditions. The intent is to reduce the
uncertainty of the risk.
A second result of a proactive approach risk assessment is the rank

ordering of failure mechanisms by expected rate of occurrence.
30
One way to accomplish this ranking is to evaluate the stress versus

strength relationships. Items with the largest overlap of the two
distributions (stress and strength) have the highest potential for
failure.
The solutions may include increasing strength or reducing the variance

of the strength.
A third result of the risk assessment is similar to the stress and

strength evaluation and includes the impacts of time or usage on the
change in the stress and strength distributions.
Either curve may experience changes to the mean or variance over

time. This may be due to degradation, wear, or increased expectation of
durability by customers.
The proactive approach entails more thinking and understanding

of how testing stresses create failures, plus characterization of
product designs, materials, and processes, and their related failure
mechanisms.
31
Two Approaches
In summary, in a reactive approach one creates a design and then

waits for field returns or standard product testing failures to prompt
product improvements.
In a proactive approach one anticipates failure mechanisms,

experimentally or via simulation, characterizes the response of the
design and materials to expected stresses, and then proceeds to the
design phase.
There are other aspects that identify a reactive versus proactive

reliability program.
For example, if the only time management discusses product reliability

is when a major customer complains about product failures, that is a
reactive approach.
If the management team regularly inquires and discusses the risk

a particular design presents to reliability performance, that is a
proactive approach.
32
Goals without Apportionment or Measures
Consider the following situation.
A life-support-equipment company manager desires to conduct a

reliability program assessment. The company is experiencing about a
50% per year failure rate and at least the Director of Quality thought it
should do better.
One of the findings was related to reliability goal setting and how it was
used within the organization.
Nearly everyone knew that the product had a 5,000-h Mean Time
Before Failure (MTBF) reliability goal, but very few knew what that
actually meant.
It was how this team used the product goal that was even more
surprising.
There were five elements to the product with five different teams
working to design those elements: a circuit board, a case, and another
three elements. Within each team, team members designed and
attempted to achieve the reliability goal of the product, the 5,000-h
MTBF goal.
33
Upon performing a data analysis of the field failures they actually did
achieve their goal, as each element was just a little better than 5,000-h
MTBF in performance.
However, reliability statistics stipulates that in a series system one has

to have higher reliability for each of the elements than for the whole-
system goal.
For example, if each element achieves 99% reliability over one year,
the reliability values of the product’s five elements would produce a
system-level reliability performance of approximately 95% or, (0.99 ×
0.99 × 0.99 × 0.99 × 0.99 × 100%) at one year.
We call this apportionment when we divvy up the goal to the various

subsystems or elements within a product.
This team skipped that step and designed each element to the same
goal intended for the system.
Compounding the issue was the simplistic attempts to measure

reliability of the various elements and total lack of measurement at the
system level.
34
Goals without Apportionment or Measures
For each component the team primarily relied on using the weakest
component within the subsystem to estimate the subsystem’s
reliability.
For example, the circuit board had about 100 parts, one of which the
vendor claimed had about a 5,000-h MTBF.
Thus that team surmised that, because it was the weakest element,
nothing would fail before 5,000 h and thus this was all the information
the team members needed to consider.
They did not consider the cumulative effect of all the other components
nor the uncertainty of the vendors estimate within their design and use
environment.
This logic was repeated for each subsystem.
The result was a product that achieved about the same reliability it
achieved in the field.
The estimated use of the product was about 750 h per year; thus each
element would achieve about 85% reliability for a year, which seemed
to be an adequate reliability goal.
35
However, this is a series system, meaning that a failure in one element

would cause the entire system to fail. The math works out as follows:
Reliability ^ 750 h h = ^ e h = 0.47 or 47%.

-750
5, 000
5
Because the product of the reliabilities of the individual five elements

was overlooked, the system reliability turned out to be less than 50%,
not the expected 85%.
The field performance was the result of how the product was designed
to meet the reliability goal for each subsystem. The team got what it
designed.
Its members had forgotten or ignored a basic, yet critical element of

reliability engineering knowledge.
36
Reliability Maturity Matrix Guide
An organizational reliability program assessment is only of value when
the resulting action creates a more effective reliability program.
Moving to the right, or increasing maturity, on the matrix provides

value to the organization.
Some examples include reduced field failures, reduced cost of product

development and testing, increased ability to hit market introduction
deadlines, and increased market share.
Each organization’s culture, history, capabilities, and priorities will

influence any reliability improvement program.
Local effective change management and the internal influence of

thought leaders will also affect any improvement effort. Therefore, any
effort to improve an organization’s reliability maturity must account
for the local culture and norms; thus each improvement program will
be different.
Yet, the basic tools, approaches, and processes related to reliability

engineering do remain largely the same across organizations.
37
The particular product and market may place unique constraints on

specific tools, but the basics tend to remain consistent.
The Reliability Maturity Matrix will provide the structure for this
guideline.
The IEEE standard 1624 Standard for Organizational Reliability

Capability (IEEE, 2008), Crosby’s Quality Is Free (Crosby, 1979), and
the journal article “Using a Reliability Capability Maturity Model to
Benchmark Electronics Companies” (Tiku, Azarian and Pecht, 2007),
provide further guidance.
The intention is to provide the recommended tasks to facilitate

a transition from one maturity level to the next across each
Measurement Category.
In general, organizations tend to have fairly consistent reliability

maturity across categories. There may be some variation, yet
commonly only one level higher or lower from the overall average
maturity.
The maturity matrix consistency reflects the cultural elements and

the overall organization’s approach or policy toward reliability. The
consistency also reflects the interconnectedness between categories.
38
Assessment is the tool to clearly identify the maturity level of an

organization as well as the cultural aspects. The recommendations
generated by the assessment focus on reinforcing strengths and
improving weaknesses.
Also, the specific recommendations focus on moving the average

maturity to the right or upward in maturity. Given the interconnected
nature of the categories, it is often difficult to only improve one
category to a higher maturity without affecting related categories.
In this discussion, we will assume that the specific tasks and tools
recommended to move an organization to the right will tend to lead to
improvements in other categories.
First, let’s take an overall look at the specific categories.
1. Management
The management team sets the tone for all aspects of an organization.
The policies, practices, and priorities all convey the management

team’s placement of reliability’s importance relative to the many
priorities within the organization.
39
How the management team acts is more important than the slogans or
official statements – where is the attention and follow up, where are the
resources being directed, who is rewarded, and what garners personal
involvement?
Understanding and attitude
This is a reflection of the level of the management team’s

comprehension of reliability engineering’s role within the organization.
Does the management team understand and use reliability tools to

make decisions? Do they seek out information or merely respond
to complaints. When and why does reliability related topics become
important?
Status
Within an organization, who are the leaders (independent of position)?

What combination of voices tend to drive the company? Who is held in
high esteem, rewarded, and promoted?
40
The status of the reliability practitioner may range from nonexistent,

to an obstacle, to a necessary part of doing business, to a valued team
member, or to a thought leader. Do people want to become a reliability
engineer because it’s viewed as important and career enhancing?
The status of those identified as reliability practitioners is one indicator

of the value placed and found related to reliability engineering
activities.
Measured cost of unreliability
The language of business is money.
What does the organization track and value and how is it expressed?
The actual measures, their accuracy, and their relevance to decision
making expresses the importance of product reliability within an
organization.
Prevailing sentiment
Stage 1: “We don’t know why we have problems with reliability”
41
Stage 2: “Is it absolutely necessary to always have problems with

reliability?”
Stage 3: “Through commitment and reliability improvement we are

identifying and resolving our problems.”
Stage 4: “Failure prevention is a routine part of our operation.”
Stage 5: “We know why we do not have problems with reliability.”
2. Product Requirement
This section includes the ability to understand customer expectations,

connect specific activities to business expectations, and create a
dynamic reliability program.
Requirements and planning
Designing and producing a product that meets customer expectations

requires some level of understanding of customer expectations for
functionality, use and environmental conditions, and durability.
42
These requirements influence every facet of product design and

production.
The overall plan to achieve the reliability requirements establishes the

sequence of reliability activities and decision points over the product
life-cycle.
Training and development
The technical skills and knowledge needed to design and produce a

product span a wide range of reliability engineering activities.
Individuals across the organization need to understand the reliability-

related goals, plans, tasks, and measures and their importance to
effectively create a reliable product.
3. Engineering
This section defines the organizations ability to create and analyze the
collection of elements making up a product. The ability to understand
the interaction of materials and processes impact on reliability
performance is central to engineering process.
43
Reliability analysis
Assessing reliability risk with a product’s design or field performance

illuminates failure modes, mechanisms, and effects.
The analysis provides information to create reliability estimates and

predictions. The ability to understand, characterize, compare, and
judge product reliability enables decisions across the product life-
cycle.
Reliability testing
The intent of physically evaluating product prototypes and production

units is to:
• identify design and supply chain weaknesses,

• explore product limits and potential failure modes,
• and determine the effects of the expected range of use profiles and
environments.
Physical testing includes demonstrating that the product’s durability

(expected reliability) meets the requirements.
44
Supply chain management
Many products consist of combination of purchased components and

materials assembled into a functional item.
The reliability performance is significantly influenced by the reliability

performance of the selected components and materials.
Reliability is only one aspect of supplier selection, and the active

involvement of reliability practitioners enables
• risk assessment,
• reliability requirements allocation,
• joint component reliability testing,
• and key vendor process control enhancements.
Furthermore, monitoring supplier impact of reliability performance,

process variation, change notices, and end of manufacture notices
enables active management of any effects on product reliability.
45
4. Feedback Process
Henry Petroski suggests engineers design based on the knowledge of

failures. (Petroski, 2006)
An organizations ability to identify and learn from failures provides the

information needed for design improvements.
Failure data tracking and analysis
Each product failure highlights an area for product reliability

improvement.
Systemically recording, tracking, analyzing, and reporting failures

from across the product life-cycle and supply chain enable you to
acquire comprehensive and timely information. The product design
team needs to understand, prioritize, and design products to minimize
product failure.
The entire business requires timely and accurate failure data for
decisions to be made concerning, e.g., improvement projects, supplier
selection, and warranty policies.
46
Validation and verification
This check step in most organizations consists of verifying that

the reliability objectives have been met and that planned reliability
activities have occurred.
A cross-check can support individual results with consistent results

from other reliability activities. The process is often part of the overall
program management process.
Reliability improvements
During this process one tries to identify and implement product

changes that are designed to improve product reliability.
The sources for improvement projects may come from reliability

testing and analysis, product failures, customer requests, changes
in the supply chain, use, or environmental conditions, or changes in
technologies or materials.
The implementation of corrective actions includes prioritization,

validation of effectiveness, and prevention of reoccurrence of similar
failure modes or mechanisms.
47
Next Steps
Now, let’s explore the specific recommendations to allow an

organization to move from one stage of maturity to the next.
For each stage we will focus on the four principal categories (the
leftmost column of the matrix) of management, product requirements,
engineering, and the feedback process.
These categories will be further broken down into subcategories to

better address the issues unique to each principal category
48
Moving from Stage 1 to Stage 2
The basic approach includes the awareness of the cost-of-unreliability
to all concerned, building awareness of basic reliability engineering
concepts and tools, plus encouraging the natural aversion of the risk of
failure.
The basic message is that the organization should deliberately address

reliability. There are tools available to help us understand and avoid
failures.
The remainder of the chapter provides recommendations to move an

organization out of Uncertainty to the next stage of reliability maturity,
Awareness.
Management
• Create basic awareness that product failures occur and can be

avoided. Understand that field failures cost the company money and
cause customer dissatisfaction.
49
• Create a basic report of the number of field failures and warranty

expenses.
• Provide training, discussion, and learning opportunities for
the management team related to basic reliability concepts and
activities. Convey that all parts of the organization contribute to the
actual product reliability.
Status
• Identify one or more reliability practitioners within the organization

to assist in product design decision making.
• Highlight individuals and the benefits of reliability-related activities.
• Promote an individual to create and manage a reliability program.
• Recognize the reliability professional’s influence on and benefit to
product design and manufacturing decisions.
• Create means to collect and report basic product reliability field

performance.
• Estimate the cost of a product return.
• Estimate the warranty cost at the individual product level.
50
• Track and report the value of reliability activities.
Product Requirements
• Publish and highlight customer requirements related to product

reliability.
• Gather and highlight information about customer use and
environmental conditions.
• Create a reliability program plan including a list of reliability
activities to accomplish.
• Create reliability overview seminars for designers and extended

product development teams.
• Create a list of reliability training resources related to industry or
technology.
• Provide training opportunities for reliability practitioners with an
emphasis on reliability concepts and statistical methods.
51
Engineering
• Poll design team for reliability risks. Determine what potential risks
are known.
• Create a prediction capability, for example by using a parts-count
approach or by drawing simple reliability block diagram and using
vendor data.
• Illustrate failure mode impact on the customer.
Reliability testing
• Create a minimum reliability test plan to address primary reliability

requirements.
• Create design verification testing of functional requirements for use
on all products shipped.
• Conduct discovery testing to determine the design margin (HALT).
• Create approved parts and suppliers (vendors) lists (AVLs).
52
• Create a vendor reliability assessment process for use with critical

component vendors and new suppliers.
• Use vendor data to qualify component for use within a product and
environment.
Feedback Process
• Collect and report regular factory yield and field failure data.
• Use Pareto charts to determine improvement projects.
• Conduct failure analysis and corrective actions on major failures.
• Create a process for management review of reliability plan

implementation.
• Compare field reliability data to requirements and predictions.
• Create a system to validate the effectiveness of corrective actions.
53
• Document design and process changes and their anticipated impact

on product reliability.
• Implement design and process changes to address customer
complaints and field failures.
• Review field failures for vendor connections and implement vendor
improvements or exclude poorly performing vendors from the AVL.
54
Once an organization has awareness of the need to address reliability
they begin to look for tools to assist in addressing product reliability.
The organization needs to build experience using the range of available
tools.
The basic message is that there are many ways to address reliability.
Let’s explore the range of tools available to help us understand and
avoid failures.

organization out of Awareness to the next stage of reliability maturity,
Enlightenment.
Management
• Conduct informal training (e.g., lunch & learn) on basic reliability

topics and invite the management team to participate.
• Highlight and train members of management in their role in vendor
selection, design priorities, product testing, and failure analysis with
55
respect to product reliability. Encourage and coach management

team members to ask customers about the importance of product
reliability.
• Provide regular summary reports on product design progress
toward reliability goals and field reliability performance.
Status
• Invite key reliability practitioners to program and division decision

meetings.
• Promote a reliability practitioner to report directly to division
management.
• Recognize the reliability professional’s influence on and benefit to
product platform decisions.
• Create means to track the costs of failure analysis and re-

engineering projects.
• Estimate costs of repairs, maintenance, replacement, and
associated activities.
56
• Create means to improve resolution (e.g., increase operating

hours, determine the root cause of failure, evaluate environmental
conditions, etc.) of product reliability field performance reports.
• Establish consistent cost calculations and reporting mechanisms
within the organization.
• Create fully stated reliability requirements including function,

environment, duration, and probability of success.
• Gather and publish customer profiles including range and
distribution of environmental and use conditions.
• Apportion reliability requirements to product subsystems and
major components.
• Create a detailed reliability program plan including budgets for
resources, personal, and capital equipment.
• Evaluate designs and suppliers for new materials or processes that

may increase reliability risk.
57
• Create and provide regular classes for engineers on root-cause

analysis and corrective action methods.
• Create and provide regular seminars for managers on reliability
activities and on the use and value of those activities for
improvement of product reliability.
Engineering
• Lead FMEA studies with willing teams.

• Conduct field data reliability analysis to estimate reliability
performance.
• Review design changes to ascertain the broader impact on product
reliability.
• Use worst-case conditions rather than only nominal conditions.
• Use failure mechanism models to design and analyze test results.
58
Reliability testing
• Create a detailed reliability test plan, including stresses for specific

failure mechanism, samples size calculations, and confidence
levels.
• Determine the failure mechanisms evaluated for each test proposed
and verify that all potential failure mechanisms are appropriately
exercised within the overall test program.
• Review vendor testing to determine whether it is adequately
connected to expected use and environmental conditions and
potential failure mechanisms.
• Include reliability requirements in design specifications and

requests for quotes from vendors.
• Include assessment information in management of AVLs.
• Request and review field reliability performance from critical
component vendors.
• Evaluate vendor end of production or change notices on product
reliability.
59
Feedback Process
• Collect and analyze failure data to guide component selection.

• Revise reliability test plans based in part on field failure data (i.e.,
evaluate test coverage and value in preventing field failures).
• Confirm the root cause of failures and the adequacy of product
improvement to avoid the failure or to mitigate failure effects.
• Collect and analyze time-to-failure information rather than failure
counts or percentages.
• Create a process to verify that supplier corrective actions have the

expected effects on product reliability.
• Compare stress screening and ongoing reliability testing to field
failures and adjust as needed.
• Compare field failure modes to expected failure modes, and modify
risk assessment practices to minimize the differences.
60
• Implement corrective actions to internally identified reliability

testing failures.
• Create means to track and report corrective action effectiveness.
• Create a lessons-learned process based on identified failure modes.
61
62
Using a wide range of reliability engineering tools has created
experience, now the team should begin selectively using the most
valuable tools for specific situations.
The basic message is that there are many ways to proactively address
reliability. We need to tailor our approach to maximize the value of
each reliability activity.
The remainder of the chapter provides recommendations to move

an organization out of Enlightenment to the next stage of reliability
maturity, Wisdom.
Management
• Provide the management team with ‘talking points’ for key reliability
program initiatives for use with customers and internal teams.
• Provide value statements related to achievement in reliability
improvements.
63
• Create a significant element of senior management’s bonus

structure based on product reliability performance.
• Discuss options for proactively addressing major reliability issues.
• Develop detailed reliability models that provide means to conduct
‘what if’ experiments for various reliability activities.
Status
• Invite key reliability practitioners to critical business and customer

meetings.
• Invite key managers to lead reliability programs and initiatives as
part of a steering committee.
• Invite reliability practitioners to discussions on early product
concept development and major vendor selection.
• Recognize and reward reliability improvement activities outside the
ranks of identified reliability professionals.
• Recognize the reliability professional’s contribution to prevention of
product failures.
64
• Establish means to estimate the return on investment of individual

reliability tasks.
• Create means to calculate the cost to the customer for each product
failure.
• Calculate the cost of product ownership over the entire product life-
cycle.
• Express reliability objectives as distribution rather than point

estimates, when applicable.
• Incorporate reliability plans within product development plans.
• Create decision points within the reliability plan to adjust activities
based on current information.
• Review supplier and vendor reliability programs to identify potential
risk areas.
• Create an overall reliability program strategy and implementation
plan.
65
• Create tailored reliability courses for key reliability tasks including

when and how to determine the need to accomplish the task.
• Create and provide seminars and workshops to senior managers on
how reliability impacts the business.
• Encourage reliability practitioners to learn how to identify failure
modes and mechanisms related to the product and industry.
• Create a reliability training program for engineers and associated
managers focused on design for reliability and implementation of
critical reliability activities.
Engineering
• Use distributions rather than point estimates for reliability

predictions.
• Include confidence intervals or bounds on data analysis results.
• Use distributions for use and environmental conditions rather than
specification values.
• Use failure mechanism models to determine cost–benefit decisions
for product changes.
66
Reliability testing
• Conduct reliability testing only when needed to resolve a question or

provide information for a decision.
• Design accelerated testing that is focused on specific failure
mechanisms.
• Expand discovery testing to include more stresses related to use
conditions and to new vendors or materials under consideration.
• Create critical-to-reliability criteria for supplier process control

and/or ongoing reliability evaluations.
• Review reliability testing and failure mechanisms for those tests
best performed by vendors (upstream or at point of least value
added).
• Require vendors to evaluate the reliability programs of their
suppliers.
• Evaluate technology maturity and stability of vendor processes and
components prior to vendor selection.
67
Feedback Process
• Conduct failure analysis to find the root cause and update design
guidelines and reliability testing to prevent future occurrences.
• Analyze failure data for systemic decision-making processes that
allowed the failure to occur.
• Create part batch, lot, or similar tracking systems.
• Assess reliability activities and their effectiveness to determine

process improvements or best practices.
• Verify that risk assessments are a closed-loop process and updated
as new information becomes available.
• Compare field failure mechanisms with expected failure
mechanisms and adjust risk assessment practices and reliability
testing procedures to minimize the difference.
68
• Create a lessons-learned process based on identified failure

mechanisms.
• Explore means to improve reliability predictions, analysis, and
testing with more effective or efficient techniques or a combination
of techniques.
• Create means to document the value of reliability activities and
publish value determination guidelines.
69
70
Reliability is important to the organization. The next stage is to
embed reliability thinking across the organization and at every level.
Considering reliability becomes a natural part of all decisions.
The basic message is that reliability engineering and consideration is

part of how the organization operates. We have a culture of reliability
and it is how we do business.

organization out of Wisdom to the next stage of reliability maturity,
Certainty.
Management
• Provide insights and mentoring concerning approaches to

systematically prevent product failures.
• Provide reliability reports on reliability predictions and associated
business impact to profit.
71
• Discuss investment areas for product reliability improvements that

impact product architecture, technology, and patent and product
portfolio.
Status
• Invite key reliability practitioners to provide input to business

strategic planning.
• Recognize the reliability professional’s contribution to customer
satisfaction and brand loyalty.
• Calculate the influence of product reliability improvements on

increased sales and brand loyalty (customer satisfaction or net
promoter indices).
• Calculate value of brand related to product reliability perception or
performance.
72
• Create reliability plans that include contingency plans for range of

design, supply chain, and requirements disruptions.
• Create reliability strategic plans that are integrated with overall
business strategic plans.
• Create means to learn about industry trends, new materials and

processes, and reliability modeling and analysis tools that may have
a meaningful impact on the business.
• Create a comprehensive reliability training program for everyone in
the organization with visible management support and involvement.
Engineering
• Include life-cycle costs in analysis for use in decision making.
73
• Create stress – life models for new materials, features, and

components when existing models are inadequate.
• Create complex simulations or Monte Carlo analysis systems to
create predictions and estimate the value of proposed changes.
Reliability testing
• Use failure mechanism models to design reliability testing, and use

test results to improve models.
• Characterize reliability of new vendor components or materials
prior to use within a product design.
• Monitor for changes in product environment, use conditions,

reliability requirements, or regulatory requirements for their impact
on product reliability.
• Monitor critical-to-reliability parameters and process control
points across the supply chain to identify shifts.
• Create contingency plans for possible obsolescence or shortages of
parts.
74
• Conduct joint studies with vendors to explore processes, materials,

and technology impact on product reliability.
Feedback Process
• Create links between customer satisfaction and product reliability.

• Create a model for determining product reliability readiness for
release based on the development of a failure reporting, analysis,
and corrective action system and other business requirements.
• Create a prognostic data collection and analysis system within
products and manufacturing equipment and processes.
• Validate the use of field failure mechanisms data and analysis to

update reliability models and design guidelines.
• Create a process to verify the effectiveness of reliability strategy and
policies.
75
• Evaluate new vendors, processes, and materials with the intent to

improve product reliability.
• Update design rules and guidelines based on product reliability
performance.
Next Steps
The last four chapters provides ideas on how to move from one stage of
maturity to the next. Now we need to ascertain the stage of maturity of
the organization.
76
How to Assess Your Reliability Program
The reliability that results is going to happen whether or not the team
designing the product or production line deliberately use reliability
engineering tools.
The elements of a product or system will respond to the environment

and will either work or fail.
While working at Hewlett-Packard I had the opportunity to conduct

a reliability program assessment of about 50 product divisions. The
assessment took one day and involved eight interviews.
“How do you know so much about our program?” was a question one
quality manager asked after reading the assessment report.
It’s all a matter of understanding the reliability decision elements and

the organization’s processes.
The key to the insights is understanding to what extent various

reliability activities take place and how they the team uses the resulting
information in decision making.
77
It’s not just what you do, it’s how those reliability related activities
impact decision making that matters.
One hypothesis we had related to whether the number of reliability

tasks the team actively used would correlate to their warranty
expenses. That worked to a point.
The teams that did not understand basic tools and had no overt or
organized reliability engineering had high warranty expenses (as a
percent of revenue).
The teams that did a large number of tasks (FMEA, HALT, ALT,
predictions, etc.) did have lower warranty expenses.
The surprise was that the teams that had the lowest warranty expenses
also conducted very few reliability activities.
The difference was that the best performing teams understood the
range of available reliability engineering activities and only used the
tools that would provide value for a given circumstance.
Less mature organization would attempt to conduct as many

reliability-related activities, including a long list of product tests, many
of which provided little actual value.
78
It was the application of the right tool at the right time that made the
difference.
Maturity and Activity
Hiring a reliability engineer or running a lot of life tests does not

necessarily improve your product’s reliability performance.
It is not the organization or activities that comprise a reliability

program; rather, your reliability performance relates to the culture
concerning reliability.
Reliability occurs at the point of decision.
Therefore, during interviews the intent is to understand how decisions

are currently made. To what extent do reliability considerations
influence decisions and what tools or methods are used to form
decisions.
For example, if we ask, “To what extent do you do HALT?” The answer
may be “We rarely use HALT.”
79
In one case, it may be that the engineer doesn’t know what HALT is and
isn’t sure whether or not the testing they conduct is similar to HALT.
They may simply be unfamiliar with that type of testing.
In another case, the engineers way that they know about HALT and
understand how and why it is used, but they have rarely used it because
they lacked appropriate situations in which HALT would be of value.
They understand that HALT is a useful tool for specific applications and
recently they have not needed to conduct HALT.
Some respond that they do HALT.
Again, there are two common responses. In one case, the team does
HALT all the time because it is required, independent of whether or not
it may be useful.
In the other case, they do HALT because it is the right tool for the
current situation.
One team didn’t know what HALT was and the other fully understood
and chose to not do HALT. The difference lies in the understanding and
application, or maturity.
80
Assessment Process
To understand how an organization’s reliability maturity, use the

following assessment process.
1. Select survey topics.
Create a list of activities and tools common to reliability practices in

your industry. It may include items rarely used. It should include the
breadth of topics related to reliability in your field.
See the DFR Methods Survey for one possible list of topics.
Some topics are broad, such as on ownership and responsibility of

product reliability or reactive or proactive approaches of management.
Some topics are very specific, such as specific tools such as FMEA or
HALT.
81
2. Establish the interview format.
These can one on one, in small groups, via phone, through an invited
survey with follow-up conversations, or by some other method. I have
found the one-to-one discussions the most useful as they permit
immediate follow-up and exploration of the rationale or motivation
behind specific behaviors or responses.
3. Conduct the interviews (collect information).
Arrange to interview or survey a cross section of people in the

organization. Select individuals with experience with the organization
and products typically designed and manufactured. Useful
interviewees include the following:
• design & development engineers (electrical, mechanical, and

software),
• design & development managers (electrical, mechanical, and
software),
• reliability or quality engineers and/or managers,
• procurement engineers (i.e., those who work with suppliers), and
82
• manufacturing engineers and/or managers (other similar titles

include: design for manufacturing, sustaining, and/or production
engineering).
Select about eight individuals for interviews, depending on the specific

situation, size, complexity, etc. of the program.
In general, each interview question starts with the phrase, ‘to what
extent.’ For example, you might ask, “To what extent do you use HALT?”
Depending on the response you may explore the motivations or

rationale behind the decision both to conduct HALT and how the HALT
results are used within the organization.
4. Document the business environment
Include notes on sales volume, cost, brand position, revenue, cost of

unreliability as percent of net revenue, etc.
Document any regulatory or customer-imposed restrictions or

requirements. Summarize the results to convey the atmosphere
around the reliability program.
83
5. Document the collected information
A summary given back to participants asking for additional input or

corrections helps with the acceptance of the assessment results and
may help avoid a mistake in your understanding.
6. Analyze the data.
This is not done during the interviews: Just let them do the talking.
Review the notes and information provided and map these to the
maturity matrix. Look for consistent approaches to making reliability-
related decisions. Look for patterns of behavior and underlying
motivations or causes.
7. Report on assessment findings.
Document and explain what you heard and how it related to the
overall organization’s maturity. The report may include the interview
summary, strengths, weaknesses, and recommendations for
improvement.
84
The assessment process should provide a view of the overall

organization’s approach to making decisions and to what extent and
how its reliability program influences those decisions.
With that basic understanding you can identify strengths to build upon,
spot weaknesses that need attention, and provide recommendations to
improve the maturity of the reliability program.
Let’s now turn to examples of specific questions that should be asked

in a program assessment.
85
86
Sample Survey Questions & Support Material
The following sample survey was part of an online survey for a multiple
division organization. While I prefer face to face interviews this allows
the collection of suitable data quickly.
Premise
The following survey explores your view of your organization’s

(product line or division) reliability program approach. The intent is to
identify strengths and weaknesses within the organization’s reliability
program.
Every organization does have a means to accomplish product reliability

performance, and every program may reflect differences related to
local change management, customer expectations or requirements,
local practices, and management priorities.
This overall assessment will assist in the development of training and

support to effect an overall improvement in reliability engineering and
performance of fielded products.
87
An early step in this program is to understand the current range of

reliability engineering practices. It is important to accurately reflect
your organization’s approach as it will guide the deployment of
resources to reinforce best practices and improve areas of weakness.
The survey is broken down into four areas (management, product

requirements, engineering, and feedback process) and provides an
overall snapshot of your organization’s reliability maturity or approach
to product reliability engineering practices.
For each set of statements select the one that best fits your
organization. Many of the segments have open ended questions to
promote discussion or additional insights.
Management
Management Understanding and Attitude
Which statement best reflects how your organization’s management

team approaches product reliability?
88
1. There is no comprehension of reliability as a management tool.

Management tends to blame reliability engineering for ‘reliability
problems.’
2. Management recognizes that reliability management may be of

value but is not willing to provide money or time to make it happen.
3. Management is still learning more about reliability management but

is becoming supportive and helpful.
4. Management is actively participating in reliability management,

having an understanding of the absolutes of reliability management
and recognizing its role in continuing emphasis.
5. Management considers reliability management an essential part of

the company system.
What reliability metrics are in use? How are they communicated and
used within the organization?
To what extent does management own reliability (i.e., pay attention

to and follow up on reliability topics)? Is this attention proactive (i.e.,
occurring prior to field issues) or reactive (i.e., occurring only in
response to reliability problems)?
89
Reliability Status
Which statement best reflects how your organization views product

reliability engineering?
1. Reliability is hidden in manufacturing or engineering departments.

Reliability testing is probably not part of the organization. Emphasis is
placed on initial product functionality.
2. A stronger reliability leader has been appointed, yet the main

emphasis is still on an audit of initial product functionality. Reliability
testing is still not performed.
3. The reliability manager reports to top management and has a role in

management of the division.
4. The reliability manager serves as an officer of the company,

reporting on status, being responsible for preventive action, and being
involved with consumer satisfaction and feedback.
5. The reliability manager serves on the board of directors. Prevention

of failure is the main concern. Reliability professionals are thought of
as leaders.
90
Rank order the following product design priorities, from 1 for top
priority to 4 for lowest priority. Assume a particular product meets the
minimum requirements in each area already.
Product features – feature set of products or use of leading

technologies
Time to market – time to ship the product
Cost – bill of materials cost or cost of goods sold
Product reliability – field performance meets or exceeds

customer duration (life) expectations
Requirements and Planning

reliability requirements and planning?
1. Discussions are informal or nonexistent.
91
2. Basic requirements based on customer requirements or standards

are considered. Plans have required activities.
3. Requirements include environment and use profiles with some

apportionment. Plans have more details with regular reviews.
4. Plans are tailored for each project and projected risks. Use is made
of distributions for environmental and use conditions.
5. Contingency planning occurs. Decisions are based on business or

market considerations. Reliability requirements and planning are part
of the strategic business plan
How are product reliability objectives stated for the product

development team? Provide an example.
Does the product development life-cycle (stage gate review process)

include reliability activities or tasks? If so, give an example.
Training and Development

reliability training and development?
92
1. Training is informally available to some, if requested.
2. Select individuals are trained in concepts and data analysis. Training

is available for design engineers.
3. Training for the entire engineering community is done for key

reliability-related processes. Managers receive training on reliability
and life-cycle impact.
4. Reliability and statistics courses are tailored for design and

manufacturing engineers. Senior managers are trained on reliability’s
impact on business.
5. New technologies and reliability tools are tracked and training

is adjusted to accommodate these. Reliability training is actively
supported by top management.
Which parts of the organization are expected to understand and use

reliability engineering tools and techniques? Select all that apply.
1. Design
2. Manufacturing
93
3. Supply chain (procurement)
4. Field service and/or customer support
Engineering
Reliability Analysis
Which statement best reflects how your organization views reliability

analysis during the product design and development phase?
1. Reliability analysis is nonexistent or solely based on manufacturing

issues.
2. Analysis consists of point estimates and reliance on handbook

parts-count methods. Basic identification and listing of failure modes
and their impact is done.
3. Formal use is made of FMEA. Field data analysis of similar products

is used to adjust predictions. Design changes lead to reevaluation of
product reliability.
94
4. Predictions are expressed as distributions and include confidence

limits. Environmental and use conditions are used for simulation and
testing.
5. Life-cycle cost is considered during design. Stress and damage

models are created and used. Extensive risk analysis is performed for
new technologies.
Give an example of an effective reliability risk analysis tool currently in

use. Please briefly describe.
Reliability Testing
Which statement best reflects how your organization accomplishes

reliability testing during the product design and development phase?
1. Reliability testing is primarily functional.
2. A generic test plan exists with reliability testing only to meet

customer or standards specifications.
95
3. A detailed reliability test plan with sample size and confidence

limits is in place. Results are used for design changes and vendor
evaluations.
4. Accelerated tests and supporting models are used. Testing to failure

or destruct limits is conducted.
5. Test results are used to update component stress and damage

models. New technologies are characterized.
Is product reliability testing an integral part of the product

development process?
Does product reliability testing include discovery types such as

HALT or margin testing to uncover design weaknesses or establish
robustness?
Supply Chain Management
Which statement best reflects how your organization views supply

chain management as related to reliability?
1. Supplier selection is based on function and price.
96
2. An approved vendor list is maintained. Audits are performed based

on issues or with critical parts. Qualification is primarily based on
vendor datasheets.
3. Assessments and audit results are used to update the AVL. Field
data and failure analysis related to specific vendors are used.
4. Vendor selection includes an analysis of each vendor’s reliability

data. Suppliers conduct assessments and audits of their suppliers.
5. Changes in environment, use profile, or design trigger vendor

reliability assessment. Component parameters and reliability are
monitored for stability.
Are specific reliability requirements communicated to key suppliers?
Are specific reliability tests accomplished by select vendors and

monitored by your organization?
97
Feedback Process
Failure Data Tracking and Analysis
Which statement best reflects how your organization responds to and

addresses reliability-related failures during the entire product life-
cycle?
1. Failures during function testing may be addressed.
2. Pareto analysis of field returns and internal testing are performed.

Failure analysis relies on vendor support.
3. Root-cause analysis is used to update the AVL and prediction

models. A summary of analysis results are disseminated.
4. Focus is on failure mechanisms. Failure distribution models are

updated based on failure data.
5. The relationship between customer satisfaction and product failures

is understood. Use is made of prognostic methods to forestall failure.
98
Is there a useful defect tracking system in use during product design

and development? Is the impact on product reliability included in the
prioritization?
Is a failure analysis process used for each product failure and

associated analysis?
Validation and Verification
Which statement best reflects how your organization conducts product

validation and verification as related to reliability?
1. Product validation and verification are informal and based on

individual instances rather than any process.
2. There is basic verification that plans are followed. Field failure data
are regularly reported.
3. Supplier agreements around reliability are monitored. Failure

modes are regularly monitored.
99
4. Internal reviews of reliability processes and tools takes place. Failure

mechanisms are regularly monitored and used to update models and
test methods.
5. Reliability predictions match observed field reliability.
Which of the following tools are used for product reliability validation
and verification? Select all that apply.
1. Parts-count prediction methods
2. Testing to pre-established standards or requirements
3. Accelerated life testing for specific failure mechanisms
4. Physics of failure modeling and analysis
5. Field returns data analysis
Are field returns analyzed and results reported across the

organization?
100
Reliability Improvements
Which statement best reflects your organization’s approach to

reliability improvement?
1. The process is nonexistent or informal.
2. Design and process change processes are followed. The corrective

action process includes internal and vendor engagement.
3. The effectiveness of corrective actions is tracked over time.

Identified failure modes are addressed in other product. Improvement
opportunities are identified as environment and use profiles change.
4. Identified failure mechanisms are addressed in all products.

Advanced modeling techniques are explored and adopted. A formal
and effective lessons-learned process exists.
5. New technologies are evaluated and adopted to improve reliability.

Design rules are updated based on field failure analysis.
Are vendor material or process changes evaluated for impact on

product reliability prior to using ‘new’ components?
101
How are internal design or process changes evaluated with respect to

impact on product reliability prior to implementation
102
Following up on the Survey
Once the results of the survey have been compiled, the next phase
entails a site visit to the company by the evaluation team.
For this phase, on a mutually accepted date, an evaluation team

visits the company. Company personnel participating in this on-
site evaluation meeting should include the reliability manager and
engineers who are involved in activities such as defining reliability
requirements, reliability predictions, derating, manufacturing yields,
testing, qualification, stress analysis, failure analysis, failure tracking,
warranties, parts selection, and supplier assessment, as well as any
others who provided answers to the questionnaire.
These personnel should bring to the meeting ‘objective evidence’ in

support of their responses to the questionnaire. The evidence may
consist of data, reports, policy drafts, or current documents.
The evaluation team offers an overview of reliability capability to

provide an understanding of the rationale and the process.
After the presentation, the company provides an overview of the

business and operations at its facility, followed by its vision of
reliability.
103
This includes, but should not be limited to, reliability objectives for
the various product categories and a description of its reliability
organization and practices.
Specifically, the presentation should include information
on the following items:
• reliability tasks performed for products,
• a list of test and failure analysis equipment,
• reliability test plan and process guidelines and/or standards,
• a list of reliability tests and some examples,
• failure analysis methods and examples,
• supplier assessment guidelines,
• part selection guidelines,
• reliability input during product development,
104
Following up on the Survey
• failure tracking strategy and examples, and
• warranty determination.
The evaluation team then assesses responses to the questionnaire and
the supporting evidence, asking follow-up questions as necessary. At
the conclusion of the meeting, the company is provided with an

informal
summary of the findings, including recommendations for corrective

actions.
Documenting the Assessment
The third and final phase involves documentation of the assessment.
The company is provided with a draft report summarizing the

evaluation team’s observations and recommendations for reliability
improvement.
105
The company is typically given an opportunity to review the draft report

and provide comments.
A final report is then issued to the company and to the organization

that requested the assessment that highlights the areas of strengths
and weaknesses, with recommendations for improvements to
approach best-in-class standards.
The report also includes the maturity level of the company along with
an explanation of the significance of that level.
106
Book Conclusions and Summary
Reliability programs can improve. A good starting place is your
understanding of the current culture around making decisions related
to reliability.
Even a simple scan of the reliability maturity matrix may provide

an insight on the stage of maturity, which provides a basis for
improvement.
A more extensive survey with interviews or via an online survey

provides addition insights about the organization’s strengths and
weaknesses.
Interviewing even eight people from around the organization starts the
change process by bringing awareness to the current situation
You may find support and potential obstacles, and you will learn more
about how the organization actually creates the reliability performance
found in the products.
The section on recommended actions is just a starting point. Every

organization and situation is different and may require very different
approaches.
107
Setting reliability goals and analyzing field failures might be common

across any organization, yet customer contracts, regulatory
requirements, and other external constraints may alter an
organizations path to improving reliability performance.
One of the keys to making change happen is to know where you are
going. The maturity matrix provides a glimpse of what is possible.
Change does take time.
Along the way be sure to encourage those making improvements,

illustrate the value of improved methods, and celebrate the successes.
A product’s potential reliability performance is created at the point of

decision.
These decisions occur every day across the organization. Improving

the reliability maturity of an organization enables every decision to
improve reliability performance.
108
Book Conclusions and Summary
109
Reliability Maturity Matrix
The stages are in columns and each contains a description of an

organization for the 11 categories.
Scan across each row to find the stage the generally describes your
organization. Circle it. The various categories may have different
stages of maturity.
Generally an organization has a single stage of maturity that best

describes their reliability program.
If the matrix is not showing properly on your screen or you would like to
print out the page for local use, visit
http://www.fmsreliability.com/accendo/ebooks/reliability-maturity/
110
Reliability Maturity Matrix
Stage 1: Uncertainty Stage 2: Awakening Stage 3: Enlightenment Stage 4: Wisdom Stage 5: Certainty
Requirements Informal or Basic customer req. Requirements include Plans customized; distributions Contingency planning occurs;
& Planning nonexistent met: plans have environment & use profiles; used for environmental & use decisions based on business &
required activities plans more detailed conditions market
Training & Informally available Some training in Reliability training for Reliability & statistics courses for New technologies & reliability
Development concepts & data engineers; manager training on engineers; senior managers trained tools tracked; reliability training
Requirements
analysis reliability & lifecycle impact on impact on business supported by management
Reliability Nonexistent or based Use of point estimates Formal use of FMEA; field Predictions expressed as Lifecycle cost considered in design;
Analysis on manufacturing & hand-book parts data from similar products distributions; environmental & use stress & damage models used;
issues count; basic ID of analyzed; design changes cause conditions used for simulation & extensive risk analysis for new
failure modes & impact reevaluation testing technologies
Reliability Primarily functional Generic test plans; Detailed reliability test plans; Accelerated tests & models used; Test results used to update
Testing testing only to meet results used for design changes testing done to failure or destruct component models; new
customer or std. specs & vendor evaluation limits technologies characterized
Engineering
Supply Chain Selection based on AVL maintained; audits AVL updated by assessments & Vendor reliability data used for Changes trigger vendor reliability
Management function & price on issues or key parts; audit results; field data & failure vendor selection; suppliers conduct assessment; component
vendor datasheets used analysis related to vendors external assessments & audit parameters & reliability monitored
Failure Data Only looks at function Field returns analysis AVL & prediction models Focus on failure mechanisms; Customer satisfaction vs. product
Tracking & failures & internal testing; FA updated by root-cause analysis; failure distribution models updated failures understood; prognostic
Analysis reliant on vendor results shared via failure data methods used
Validation & Informal, without Basic verification of Supplier reliability agreements Internal reviews of reliability Reliability predictions match
Verification process plans followed; Field & failure modes regularly processes & tools, failure observed field reliability
data regularly reported monitored mechanisms monitored
Reliability Nonexistent or Design & process Effectiveness of corrective Failure mechanisms addressed in New technologies evaluated &
Feedback Process
Improvement informal change processes actions tracked; failure modes all products; modeling techniques adopted; designs updated per field
followed, corrective addressed in other products; & lessons-learned process adopted failure analysis
action taken improvements identified
Understand. & Has no grasp Recognizes but takes Becoming supportive & helpful Actively participating Considers essential to company
Attitude no action
Status No status Conduct of specific and Reliability manager reports Reliability manager is an officer, Reliability manager is a board
routine product testing to senior management & has reporting on actions & involved member; prevention is key concern
& failure analysis tasks influence in managing division with consumer affairs
Cost of Not done Direct warranty Warranty, corrective action Customer & lifecycle unreliability Lifecycle cost reduction done via
Management
Unreliability expenses only materials, & engineering costs costs identified & tracked product reliability improvements
monitored
112
Glossary of Terms
ALT — An accelerated life testing is the evaluation of the time-to-
failure behavior for a specific failure mechanism or system using
higher than expected stress(s). The intent is to understand the
reliability performance under normal stress conditions, generally
with the use of an acceleration model.
AVL — The approved vendor list records the suppliers that have met
some set of criteria. Reliability performance may be one criteria.
Using vetted suppliers reduces the risk of vendor introduced failure
mechanisms.
Derating — Derating is a process of designing or selecting components

that have sufficient ability to withstand the various stresses
experienced during operation. Generally the operating conditions
are well below the maximum rated stress level.
FMEA — A failure mode and effect analysis is a systematic method to

identify and prevent product failures.
HALT — The highly accelerated life test is a method to discover failure

modes and mechanisms. The process generally uses multiple
stresses with increasing intensity to stimulate failures.
113
MTBF — Mean time to failure is the inverse of the mean number of

failures in a given time period. It is commonly calculated by dividing
the total hours of operation of one or more systems by the number
of failures that occur during that time period.
114
References
Crosby, Philip B. 1979. Quality Is Free: The Art of Making Quality
Certain. New York: Signet.
IEEE Std 1624-2008. 2008. IEEE Standard for Organizational

Reliability Capability. New York: IEEE.
Petroski, Henry. 2006. Success Through Failure : The Paradox of

Design. Princeton: Princeton University Press.
Tiku, S., M. Azarian, and M. Pecht. 2007. “Using a Reliability

Capability Maturity Model to Benchmark Electronics Companies.”
International Journal of Quality & Reliability Management 24:5,
547-563.
115
Are you Ready to Accelerate
your Reliability Program and Career?
We’ve put together a comprehensive remote support and

mentoring program, which we call Reliability Coaching.
The book you’ve just read covers one element of creating an

effective reliability program or career … and that’s only the
beginning.
We’ve been working on hundreds of projects developing products,

streamlining maintenance, and improving reliability programs
for over 20 years. We’ve been fortunate to enjoy a lot of success in
that time, and it took a lot of work … and we’ve made our share of
mistakes along the way.
What if you could directly benefit from those years of experience—

and avoid those mistakes?
What if you could easily learn and apply reliability engineering best
practices, tools, and resources?
What if you could create a culture of reliability in your organization

with everyone working toward the same goals?
We’ve got something to show you. We call it Reliability Coaching,

and it’s the best way to enhance your reliability program & career.
www.fmsreliability.com/reliability-coaching/
Reliability Maturity
Understanding and Improve Your Reliability Program
Fred Schenkelberg
Fred Schenkelberg is an international authority on reliability
engineering. He is the reliability expert at FMS Reliability, a
reliability engineering and management consulting firm he founded
in 2004. Fred left Hewlett Packard (HP)’s Reliability Team where he
helped create a culture of reliability across the corporation to assist
other organizations. His passion is working with teams to improve
product reliability, customer satisfaction, and efficiencies in product
development; and to reduce product risk and warranty costs. Fred’s areas of expertise are:
reliability program development, accelerated life test design and analysis, reliability statistics,
risk assessment, test planning, and training. He has a Bachelor of Science in Physics from
the United States Military Academy and a Master of Science in Statistics from Stanford
University.
About this book

Assess your program and determine the next steps to improve your program.
• Understand what needs to change and why
• Discover proactive methods to get ahead of reliability issues
• Create a culture of reliability in your organization
• Improve you influence
This book details:
• The five stages of reliability maturity
• Assessment methods to determine your organizations maturity
• Specific recommendations to improve your program
Design : Product : Management

& Leadership : Quality Control
ebook ISBN: 978-1-938122-04-0

paperback ISBN: 978-1-938122-05-7

Reliability Maturity

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Reliability Maturity

Transféré par

Droits d'auteur :

Formats disponibles

Reliability

Los Gatos, California

Licensed under the Creative Commons

Feel free to email, tweet, blog, and pass this ebook

If you find this work of value to you, consider

If you have purchased a copy, Thank you!

FMS Reliability Publishing

15466 Los Gatos Blvd #109-371

Printed in the United States of America

ebook ISBN: 978-1-938122-04-0

Exploring Reliability Culture 11

Three Ideas to Overcome Organization Inertia 17

Reactive and Proactive 25

Goals without Apportionment or Measures 33

Reliability Maturity Matrix Guide 37

Moving from Stage 1 to Stage 2 49

Moving from Stage 2 to Stage 3 55

Moving from Stage 3 to Stage 4 63

Moving from Stage 4 to Stage 5 71

How to Assess Your Reliability Program 77

Sample Survey Questions & Support Material 87

Following up on the Survey 103

Book Conclusions and Summary 107

Glossary of Terms 111

A well-designed product will meet reliability expectations. Likewise, a

The assembly process alone cannot improve product reliability. The

In order to create a reliable product the design must consider the

Reliability occurs at the point of decision during the design

Maturity refers to the behaviors within an organization. A mature

Maturity reflects the culture or approach to reliability. Immature

A reliability program assessment is a tool used to determine the

By using a structured interview or survey approach with a cross

With an assessment in hand, it’s time to create recommendations to

with reinforcement would improve further. In general, organizations

The reliability maturity matrix is a framework to help you understand

This book’s intent is to provide you with practical and actionable

The reliability maturity matrix serves as a guide to assist an

“We don’t know why we have problems with reliability.”

Reliability is rarely discussed or considered during design and

Product returns resulting from failure are considered a part of doing

Field failures are rarely investigated, and often blame is assigned to

The few people who consider reliability improvements gain little

Reliability testing is done in an ad hoc fashion and often simply to meet

“Is it absolutely necessary to always have problems with

Reliability is discussed by managers but not supported by funding or

Some elements of a reliability program are implemented, yet generally

Some experimental use of tools such as FMEA and accelerated

Some analysis is done to estimate reliability or understand field

There is, however, an increasing emphasis on understanding failures

Failure analysis is typically accomplished by component vendors with

“Through commitment and reliability improvement we are

A robust reliability program exists and includes many tools and

Generally, significant effort is directed to resolving prototype and field

Some reliance is placed on establishing standard testing and

Predictions are primarily made to address customer requests and not

“Failure prevention is a routine part of our operation.”

Each product program or project has a tailored reliability program

Testing focuses on either discovering failure mechanisms or

Testing often proceeds to failure, if possible.

Advanced data analysis tools employed regularly and reports

There is increasing cooperation with key suppliers and vendors to

“We know why we do not have problems with reliability.”

Product reliability is a strategic business activity across the

Exploring Reliability Culture 11

Three Ideas to Overcome Organization Inertia 17

Reactive and Proactive 25

Goals without Apportionment or Measures 33

Reliability Maturity Matrix Guide 37

Moving from Stage 1 to Stage 2 49

Moving from Stage 2 to Stage 3 55

Moving from Stage 3 to Stage 4 63

Moving from Stage 4 to Stage 5 71

How to Assess Your Reliability Program 77

Sample Survey Questions & Support Material 87

Following up on the Survey 103

Book Conclusions and Summary 107

Glossary of Terms 111