Vous êtes sur la page 1sur 126

Reliability

Maturity
Understand and Improve Your
Reliability Engineering Program

Fred Schenkelberg
Reliability
Maturity
Understand and Improve Your
Reliability Engineering Program

Fred Schenkelberg

Los Gatos, California


2014
Copyright © 2014 Fred Schenkelberg

Licensed under the Creative Commons

Attribution-NonCommercial-NoDerivatives
4.0 International License.
http://creativecommons.org/licenses/by-nc-nd/4.0

Feel free to email, tweet, blog, and pass this ebook


around the web but please don’t alter any of its
contents when you do. Thanks!

If you find this work of value to you, consider


purchasing a copy and supporting the work.

If you have purchased a copy, Thank you!

FMS Reliability Publishing

15466 Los Gatos Blvd #109-371


Los Gatos, California 95032
fmsreliability.com/publishing/

Printed in the United States of America

ebook ISBN: 978-1-938122-04-0


paperback ISBN: 978-1-938122-05-7
Contents
Introduction1

Maturity Matrix 5

Exploring Reliability Culture 11

Three Ideas to Overcome Organization Inertia 17

Reactive and Proactive 25

Goals without Apportionment or Measures 33

Reliability Maturity Matrix Guide 37

Moving from Stage 1 to Stage 2 49

Moving from Stage 2 to Stage 3 55

Moving from Stage 3 to Stage 4 63

Moving from Stage 4 to Stage 5 71

How to Assess Your Reliability Program 77

Sample Survey Questions & Support Material 87

Following up on the Survey 103

Book Conclusions and Summary 107

Glossary of Terms 111

References113
Introduction
Product reliability refers to how well a product performs over time.
The reliability performance is the direct result of decisions made and
actions taken during design, assembly, and use.

A well-designed product will meet reliability expectations. Likewise, a


weak design will suffer from more failures than expected.

The assembly process alone cannot improve product reliability. The


design of a product establishes the reliability performance potential
assuming it is assembled correctly.

In order to create a reliable product the design must consider the


expected use conditions. Even a simple product will experience
multiple stress throughout the products use.

Reliability occurs at the point of decision during the design


process. Decisions may or may not deliberately include reliability
considerations. The reliability culture or maturity of an organization
establishing the type and amount of reliability consideration each
decision receives.

Maturity refers to the behaviors within an organization. A mature


company is able to repeatedly create reliable products. An immature
company’s erratic processes may or may not create reliable products.

1
Reliability Maturity - Understand and Improve Your Reliability Program

Maturity reflects the culture or approach to reliability. Immature


organizations tend to ignore or use crude techniques to set
requirements, identify risks, or measure results. Mature organizations
proactively work across the organization to enable appropriate
decisions by using specific techniques fit for the task.

One simple example is the way two organizations use HALT (a testing
process to determine likely failure modes). The immature organization
does not know what HALT is nor understand HALT’s purpose or value.
The mature organization uses HALT when appropriate and cost
effective. Its employees know the what, how, and why for HALT.

A reliability program assessment is a tool used to determine the


current maturity stage of an organization. In most organizations
very few if any fully understand the entire set of decisions and actions
cumulating in the resulting product reliability performance.

By using a structured interview or survey approach with a cross


section of the organization, you can develop an understanding of
the overall reliability program. In this book you will learn about the
assessment process and how to conduct your own assessments.

With an assessment in hand, it’s time to create recommendations to


improve the organization’s reliability maturity. In part, this is what is
missing and inhibiting reliability maturity or what is working well and

2
Introduction

with reinforcement would improve further. In general, organizations


have one or more areas that are stronger or weaker (more or less
mature) yet tend to align with one stage of maturity.

The reliability maturity matrix is a framework to help you understand


the organization’s reliability culture and to make improvements to
that culture, as needed. Recommendations that move the organization
to the right on the maturity matrix should address all the key
reliability practice areas. This book includes specific steps to move an
organization from one block of the matrix to the block to the right.

This book’s intent is to provide you with practical and actionable


information so you can change the culture of your organization and
consistently create reliable products for your customers.

3
Reliability Maturity - Understand and Improve Your Reliability Program

4
Maturity Matrix
The concept of a maturity model is not new. It provides a means to
identify the current state and illuminates the possible improvements
that can be made to a reliability program.

The reliability maturity matrix serves as a guide to assist an


organization in improving its program.

In general, the higher stages are most cost effective and efficient at
achieving optimal product reliability performance. There are five
stages.

Stage 1: Uncertainty

“We don’t know why we have problems with reliability.”

Reliability is rarely discussed or considered during design and


production.

Product returns resulting from failure are considered a part of doing


business.

5
Reliability Maturity - Understand and Improve Your Reliability Program

Field failures are rarely investigated, and often blame is assigned to


customers.

The few people who consider reliability improvements gain little


support.

Reliability testing is done in an ad hoc fashion and often simply to meet


customer requirements or basic industry standards.

Stage 2: Awakening

“Is it absolutely necessary to always have problems with


reliability?”

Reliability is discussed by managers but not supported by funding or


training.

Some elements of a reliability program are implemented, yet generally


not in a coordinated fashion.

Some experimental use of tools such as FMEA and accelerated


and highly accelerated life testing, but most effort still focuses on
standards-based testing and meeting customer requirements.

6
Maturity Matrix

Some analysis is done to estimate reliability or understand field


failure rates, yet limited use is made of these data in making product
decisions.

There is, however, an increasing emphasis on understanding failures


and resolving them.

Failure analysis is typically accomplished by component vendors with


little result.

Stage 3: Enlightenment

“Through commitment and reliability improvement we are


identifying and resolving our problems.”

A robust reliability program exists and includes many tools and


processes.

Generally, significant effort is directed to resolving prototype and field


reliability issues. Increasing reliance is placed on root-cause analysis
to determine appropriate solutions.

7
Reliability Maturity - Understand and Improve Your Reliability Program

Some tools are not used to their full potential owing to lack of
understanding of reliability and how the various tools apply.

Some reliance is placed on establishing standard testing and


procedures for all products. Only some use of these testing results is
made for estimating product reliability to supplement predictions.

Predictions are primarily made to address customer requests and not


as feedback to design teams.

Stage 4: Wisdom

“Failure prevention is a routine part of our operation.”

Each product program or project has a tailored reliability program


that can be adjusted as the understanding of product reliability risks
changes.

Reliability tools and tasks are selected and implemented because they
will provide needed information for decisions.

Testing focuses on either discovering failure mechanisms or


characterizing failure mechanisms.

8
Maturity Matrix

Testing often proceeds to failure, if possible.

Advanced data analysis tools employed regularly and reports


distributed widely.

There is increasing cooperation with key suppliers and vendors to


incorporate the appropriate reliability tools upstream.

Stage 5: Certainty

“We know why we do not have problems with reliability.”

Product reliability is a strategic business activity across the


organization.

There is widespread understanding and acceptance of design for


reliability and how it fits into the overall business.

Product reliability is accurately predicted prior to product launch using


a mix of appropriate techniques.

New materials, processes, and vendors are carefully considered for


their ability to meet internally established reliability requirements.

9
Reliability Maturity - Understand and Improve Your Reliability Program

The few failures that do occur are expected and analysis is done to
identify early signs of material or process changes.

Customers and suppliers are regularly consulted on ways to improve


reliability.

Nature of Maturity

The stages of maturity may or may not proceed in a progression


within an organization. It is not like a plant that begins as a seed and
eventually matures.

An organization may start at stage 2, skipping stage 1 and never


progressing.

Some organizations do advance from lower stages to stage 5. They also


may regress to a lower stage over time.

It is with deliberate effort that an organization advances and maintains


one of the more mature stages. Once established in the culture higher
stages of maturity the self-sustaining nature of the stage will take little
effort to maintain.

10
Exploring Reliability Culture
Years ago I had the opportunity to assess the reliability programs of
two teams within the same organization. They made similar products
for different segments of the market, and the teams were about the
same size.

Two years previously, both teams had lost their staff reliability
professional.

Furthermore, both teams were located in one building, one upstairs


and the other downstairs, which made scheduling the assessment
interviews convenient.

Upstairs, Downstairs

Though the course of the interviews I enjoyed the conversations more


with the team upstairs. The interviews started on time and were not
interrupted, and I noticed that the office plants were common, green,
and healthy.

The engineers and managers knew how to use a wide range of


reliability tools to accomplish their tasks. For example, the electrical

11
Reliability Maturity - Understand and Improve Your Reliability Program

design engineer knew about derating and accelerated life testing,


and she also knew about the goal and how it was apportioned to her
elements of the product.

Each person I talked to upstairs knew the overall objective and how
they provided and received information using a range of reliability tools
to make decisions.

They enjoyed a very low field failure rate and simply went about the
business of creating products.

Downstairs was different.

The interviews rarely started on time and most were interrupted by


an urgent request usually involving an emerging major field issue or
customer complaint. I didn’t see any office plants, just plenty of coffee
pots.

The engineers and managers knew that Phil, the former reliability
engineer with the team, did most of the reliability tasks. When I asked
about stress testing or risk assessment, the responses I got were “That
was Phil’s job” or “Phil used to do something like that.”

12
Exploring Reliability Culture

Most did not know what HALT or ALT was and didn’t have time to find
out.

There was a vague goal, but all agreed that because it wasn’t measured
during product development it was meaningless.

The downstairs team had a very high field failure rate and the design
team often spent 50% or more of its time addressing customer
complaints.

History

The only salient difference between the teams and their history was the
behavior of the former reliability professionals with each team.

Upstairs, Mabel was a reliability professional well versed in a wide


range of reliability tools and processes. She provided direct support
along with coaching and mentoring across the organization.

She encouraged every member of the team to learn and use the
appropriate tools to make decisions. The team became empowered to
make decisions that led to products meeting their reliability goals.

13
Reliability Maturity - Understand and Improve Your Reliability Program

Downstairs, Phil was another reliability professional well versed in a


wide range of reliability tools and processes.

He directly supported the team by doing the derating calculations,


asking vendors for reliability estimates, designing and conducting
HALT or ALT as needed, and performing the myriad other tasks related
to creating a reliable product.

He provided input and recommendations for design changes that


would improve reliability, and he was a key member of the team.

Phil was not a coach or mentor, however, and as he moved to a new


role his knowledge and skills went with him. He preferred to just do
it himself and often found he had little time to teach others about
reliability engineering tasks.

The difference between these teams was in the culture.

This difference showed in who had and who used reliability engineering
knowledge. When the all team members have knowledge appropriate
for their role on the team, they can apply those tools to assist in making
design decisions.

14
Exploring Reliability Culture

Without that knowledge, design teams will use the tools and knowledge
they have to make design decisions. Without the consideration of
reliability-related information the design decisions are made blind to
the impact.

Take Away

Reliability occurs at decision points during the design process:

• when components are selected


• when structures are finalized
• or when all risks have been addressed.

Near the end of any product development process the team asks
whether the product is ‘good enough’ to start production and introduce
the product to the market.

Having a clear goal with appropriate measure of the current design’s


ability to meet that goal provides the reliability aspect of ‘good enough.’

Each organization or product is different.

15
Reliability Maturity - Understand and Improve Your Reliability Program

The markets, expectations, and environments are all different. Yet,


every product achieves some level of product reliability.

The culture is only one factor, yet I suspect that in this case you would
agree that working upstairs would be preferable.

16
Three Ideas to Overcome Organization Inertia
Sometimes, it seems the forces of nature are working against our
ideas.

I recall being frustrated as a child playing in the sandbox. I


wanted to create a ramp of sand to race my cars down. No
matter how much I pushed and patted the dry sand succumbed
to some unseen force and did not hold the desired shape.

In business we sometime experience the same frustration. It’s not


gravity in this case: We are facing organizational inertia.

Organization Physics

Once a group of people settle into a routine way of accomplishing


something, it is not a simple matter to change the process. You may
have experienced this resistance.

Like the physics concept of inertia (recall that a body at rest tends
to remain at rest) people that are familiar with a ‘way’ something
currently happens tend to want it to stay that way.

17
Reliability Maturity - Understand and Improve Your Reliability Program

Just as with a physical object on the frictionless plane, no amount of


cajoling, presentations, or commands will move the object.

Unlike the mass on the plane, we are not allowed to strike our fellow
workers with some force to change their state from resting to in
motion. This is generally frowned upon.

So, what can we do? We know change happens, we know our ideas
have merit, we know there is value in making improvements.

Improving a Reliability Program

Often, when an organization asks someone for a reliability program


assessment, what is really being asked is how to change the
organization and sometimes how to change the culture itself.

Sure, an assessment will result in recommendations for


improvements. However, those recommendations, no matter how
compelling and obvious, are of no value unless implemented.

That is where inertia comes back into play.

18
Three Ideas to Overcome Organization Inertia

Overcoming Organizational Inertia

Here are a couple of tips that may help you implement reliability
improvements while overcoming organizational inertia.

• Work with key influencers.


• Make the current reality visible.
• Celebrate successes.

Every organization is different and every situation warrants its own


approach, yet these three tips may help you look for opportunities to
accelerate the implementation of your proposed changes.

Work with Key Influencers

Some people within an organization have the ability to sway many


others.

These people are the ones others look to for advice. They are the ‘go to’
people for a range of topics, including reliability, if you’re lucky. They
may or may not be managers.

19
Reliability Maturity - Understand and Improve Your Reliability Program

Getting them on board may provide the credibility, support, and


influence you need to move forward.

Start by understanding what motivates these key people. If they want


the credit for the idea — give it to them. If they want only what’s best for
the company — show how improving reliability does so.

A couple of one–on-one meetings will determine whether or not you


have their support.

Change in the organization is easier to implement with their active


support. As in the sandbox, adding a little water to bind the sand
together would have helped in building a ramp.

In an organization there are those that provide the ‘binder’. Working


with key influencer may accelerate the implementation of your project.

Make the Current Reality Visible

Many team members claim that they understand product reliability


and that it is valuable to their customers, the company, and
shareholders.

20
Three Ideas to Overcome Organization Inertia

Yet, few can tell you the cost of unreliability.

Make the cost of failure visible.

No one really likes to look at failures too closely, unless one is a failure
analyst. Counting profits and measuring sales volume are so much
more fun.

Product failures, although we all know they occur, are often overlooked
as a subject.

Track down and publish internally the warranty cost per unit sold
and total warranty expense. Then compare these numbers to the
cost of goods sold and net profit. You may find the cost of failure in
these terms to be useful for others to understand the magnitude of
opportunity that reducing product failure represents.

Besides, to make good decisions we need cost per failure type


information to balance the other information also provided in terms of
money, i.e., production costs, material costs, sales per day, etc.

Coupled with a clear plan to reduce the cost of failures, this process
may just garner enough attention to gain acceptance of your ideas.

21
Reliability Maturity - Understand and Improve Your Reliability Program

Celebrate Successes

Somewhere in your organization are those who are doing the right
things already. Find them and help them gain the recognition they
need.

Tell stories about what they did and the difference it is making.
Highlight their work as an example of what can be done in our
organization.

As one or more people start to implement your ideas for reliability


program improvement, help them to be successful. Then celebrate
with them and herald the success across the organization.

This process resembles a grass-roots effort to organization change,


but with the added feature of promoting success as you go forward.

Getting Moving

In the sandbox year ago, I saw a friend use a bit water to


change the material to something that worked a bit better.
That idea sparked finding a wooden board to use instead. I
changed the material, thus finding a much better solution.

22
Three Ideas to Overcome Organization Inertia

As you work to improve your reliability program, keep in mind that


you are working with people. Like sand, sometimes they need to find
support, sometimes they need to understand the goal, and sometimes
they need a little encouragement to firm up resolve.

Obviously, change happens. We can encourage change to improve


product reliability and share the benefits. There will be plenty of
benefits to go around.

23
Reliability Maturity - Understand and Improve Your Reliability Program

24
Reactive and Proactive
Do you let events happen to you, or do events follow your designs and
expectations?

Are you a spectator or an actor?

Do you wonder about your products’ future or do you control that


future?

Are you reactive or proactive?

Every reliability and maintenance program is a system. Every program


has inputs, such as product testing results and field returns.

Every reliability program has outputs, such as product design and


production.

In the most basic terms, a reliability program includes product


specifications for functionality, including expected durability. The
program includes some form of design, verification, production, and
field performance.

25
Reliability Maturity - Understand and Improve Your Reliability Program

Given this basic life-cycle description it is possible for two types of


approaches to evaluating the product life-cycle: reactive and proactive.

Every Design Will Fail

Let’s consider the notion that every product will eventually fail.

Even the most robust product on Earth will fail when the Sun expires.

Well before the collapse of the solar system most products made today
will have completely failed. The failures will range from deterioration
of materials, to stress conditions (e.g., lightning strikes), or simply to
misuse.

Some products will simply wear out; others will become obsolete and
lose compatibility with other systems; others will simply no longer
provide sufficient value.

Another important notion is that, with any product design, there are
a finite number of faults. A button has a limited number of actuation
cycles before accumulated stress cracks the switch dome.

26
Reactive and Proactive

Any given material has a degradation mechanism (corrosion, polymer


chain scission, etc.) that slowly deteriorates the material’s strength.

A ‘bug’ in the software can disable the equipment temporarily.

Further, there are possible design elements in the product for which
the designer failed to study how these would be affected by production
variation, user demand, or environment variations.

In every case, sooner or later, the design flaw will lead to failure.

Nonetheless, given only a finite number of failures, it is possible to find


and remove most design errors.

Reactive Approach

The most common approach to product reliability is to wait for product


failures and then respond with analysis, adjustments, and refinements
in an attempt to improve product reliability.

The naive wait for the failure reports from customers before taking
action. The team’s logic, if even considered, is the following:

27
Reliability Maturity - Understand and Improve Your Reliability Program

• We are good designers.


• The customer will use the product in unforeseen environments and
applications.
• If there are customer failures we will consider improvements.

For some products, with limited release and ample time to redesign the
product, this may be perfectly feasible.

A simple improvement the design team could consider is an estimate


of the customer’s use profile and environmental conditions.

Armed with this information, the team then evaluates the impact of
the conditions on the product’s reliability though standardized testing.
Setting testing conditions at or slightly above expected operating
environments enables direct evaluation of the design to meet expected
conditions.

The faults found would be similar to the failure expected to occur in


the customer’s hands, and there may be time for a redesign before the
product is shipped to customers.

Carrying out this logic may lead to a broad spectrum of testing that is
both expensive and time consuming.

28
Reactive and Proactive

Part of the logic of product testing includes the thought, “If we test in
enough ways over the full range of use and environmental conditions,
we should find and correct every design fault.”

There is often a heavy reliance on industry standards and common test


methods for every product.

Further improvements to product reliability can refine this reactive


method; these include using simulations, performing risk analysis,
and undertaking early evaluation and testing of subsystems and
components.

The overall approach is often limited by knowledge of actual use


conditions, lack of test samples, and lack of time.

Proactive Approach

Moving to a proactive approach can reduce the amount of product


testing and increase product reliability.

Although this may seem similar to the reactive approach, it involves a


focus on failure mechanisms instead of test methods.

29
Reliability Maturity - Understand and Improve Your Reliability Program

Products fail because they do not have sufficient strength to withstand


a single application of high stress (being dropped, being exposed to a
static discharge, etc.) or they accumulate damage (e.g., from wear,
corrosion, or drift) with use or over time.

By thinking though how a product could fail by considering the


materials, design, assembly process, and the same for vendor-
supplied elements, the product team determines a list of possible
failure mechanisms.

In this approach not all the failure mechanisms will be fully understood
or characterized.

The risk in this case lies in the decision to launch the product while not
understanding the possibility or potential magnitude of product failure

The amount of risk itself is unknown.

Therefore, the proactive team proceeds to characterize the design or


material under the expected use conditions. The intent is to reduce the
uncertainty of the risk.

A second result of a proactive approach risk assessment is the rank


ordering of failure mechanisms by expected rate of occurrence.

30
Reactive and Proactive

One way to accomplish this ranking is to evaluate the stress versus


strength relationships. Items with the largest overlap of the two
distributions (stress and strength) have the highest potential for
failure.

The solutions may include increasing strength or reducing the variance


of the strength.

A third result of the risk assessment is similar to the stress and


strength evaluation and includes the impacts of time or usage on the
change in the stress and strength distributions.

Either curve may experience changes to the mean or variance over


time. This may be due to degradation, wear, or increased expectation of
durability by customers.

The proactive approach entails more thinking and understanding


of how testing stresses create failures, plus characterization of
product designs, materials, and processes, and their related failure
mechanisms.

31
Reliability Maturity - Understand and Improve Your Reliability Program

Two Approaches

In summary, in a reactive approach one creates a design and then


waits for field returns or standard product testing failures to prompt
product improvements.

In a proactive approach one anticipates failure mechanisms,


experimentally or via simulation, characterizes the response of the
design and materials to expected stresses, and then proceeds to the
design phase.

There are other aspects that identify a reactive versus proactive


reliability program.

For example, if the only time management discusses product reliability


is when a major customer complains about product failures, that is a
reactive approach.

If the management team regularly inquires and discusses the risk


a particular design presents to reliability performance, that is a
proactive approach.

32
Goals without Apportionment or Measures
Consider the following situation.

A life-support-equipment company manager desires to conduct a


reliability program assessment. The company is experiencing about a
50% per year failure rate and at least the Director of Quality thought it
should do better.

One of the findings was related to reliability goal setting and how it was
used within the organization.

Nearly everyone knew that the product had a 5,000-h Mean Time
Before Failure (MTBF) reliability goal, but very few knew what that
actually meant.

It was how this team used the product goal that was even more
surprising.

There were five elements to the product with five different teams
working to design those elements: a circuit board, a case, and another
three elements. Within each team, team members designed and
attempted to achieve the reliability goal of the product, the 5,000-h
MTBF goal.

33
Reliability Maturity - Understand and Improve Your Reliability Program

Upon performing a data analysis of the field failures they actually did
achieve their goal, as each element was just a little better than 5,000-h
MTBF in performance.

However, reliability statistics stipulates that in a series system one has


to have higher reliability for each of the elements than for the whole-
system goal.

For example, if each element achieves 99% reliability over one year,
the reliability values of the product’s five elements would produce a
system-level reliability performance of approximately 95% or, (0.99 ×
0.99 × 0.99 × 0.99 × 0.99 × 100%) at one year.

We call this apportionment when we divvy up the goal to the various


subsystems or elements within a product.

This team skipped that step and designed each element to the same
goal intended for the system.

Compounding the issue was the simplistic attempts to measure


reliability of the various elements and total lack of measurement at the
system level.

34
Goals without Apportionment or Measures

For each component the team primarily relied on using the weakest
component within the subsystem to estimate the subsystem’s
reliability.

For example, the circuit board had about 100 parts, one of which the
vendor claimed had about a 5,000-h MTBF.

Thus that team surmised that, because it was the weakest element,
nothing would fail before 5,000 h and thus this was all the information
the team members needed to consider.

They did not consider the cumulative effect of all the other components
nor the uncertainty of the vendors estimate within their design and use
environment.

This logic was repeated for each subsystem.

The result was a product that achieved about the same reliability it
achieved in the field.

The estimated use of the product was about 750 h per year; thus each
element would achieve about 85% reliability for a year, which seemed
to be an adequate reliability goal.

35
Reliability Maturity - Understand and Improve Your Reliability Program

However, this is a series system, meaning that a failure in one element


would cause the entire system to fail. The math works out as follows:

Reliability ^ 750 h h = ^ e h = 0.47 or 47%.


-750
5, 000
5

Because the product of the reliabilities of the individual five elements


was overlooked, the system reliability turned out to be less than 50%,
not the expected 85%.

The field performance was the result of how the product was designed
to meet the reliability goal for each subsystem. The team got what it
designed.

Its members had forgotten or ignored a basic, yet critical element of


reliability engineering knowledge.

36
Reliability Maturity Matrix Guide
An organizational reliability program assessment is only of value when
the resulting action creates a more effective reliability program.

Moving to the right, or increasing maturity, on the matrix provides


value to the organization.

Some examples include reduced field failures, reduced cost of product


development and testing, increased ability to hit market introduction
deadlines, and increased market share.

Each organization’s culture, history, capabilities, and priorities will


influence any reliability improvement program.

Local effective change management and the internal influence of


thought leaders will also affect any improvement effort. Therefore, any
effort to improve an organization’s reliability maturity must account
for the local culture and norms; thus each improvement program will
be different.

Yet, the basic tools, approaches, and processes related to reliability


engineering do remain largely the same across organizations.

37
Reliability Maturity - Understand and Improve Your Reliability Program

The particular product and market may place unique constraints on


specific tools, but the basics tend to remain consistent.

The Reliability Maturity Matrix will provide the structure for this
guideline.

The IEEE standard 1624 Standard for Organizational Reliability


Capability (IEEE, 2008), Crosby’s Quality Is Free (Crosby, 1979), and
the journal article “Using a Reliability Capability Maturity Model to
Benchmark Electronics Companies” (Tiku, Azarian and Pecht, 2007),
provide further guidance.

The intention is to provide the recommended tasks to facilitate


a transition from one maturity level to the next across each
Measurement Category.

In general, organizations tend to have fairly consistent reliability


maturity across categories. There may be some variation, yet
commonly only one level higher or lower from the overall average
maturity.

The maturity matrix consistency reflects the cultural elements and


the overall organization’s approach or policy toward reliability. The
consistency also reflects the interconnectedness between categories.

38
Reliability Maturity Matrix Guide

Assessment is the tool to clearly identify the maturity level of an


organization as well as the cultural aspects. The recommendations
generated by the assessment focus on reinforcing strengths and
improving weaknesses.

Also, the specific recommendations focus on moving the average


maturity to the right or upward in maturity. Given the interconnected
nature of the categories, it is often difficult to only improve one
category to a higher maturity without affecting related categories.

In this discussion, we will assume that the specific tasks and tools
recommended to move an organization to the right will tend to lead to
improvements in other categories.

First, let’s take an overall look at the specific categories.

1. Management

The management team sets the tone for all aspects of an organization.

The policies, practices, and priorities all convey the management


team’s placement of reliability’s importance relative to the many
priorities within the organization.

39
Reliability Maturity - Understand and Improve Your Reliability Program

How the management team acts is more important than the slogans or
official statements – where is the attention and follow up, where are the
resources being directed, who is rewarded, and what garners personal
involvement?

Understanding and attitude

This is a reflection of the level of the management team’s


comprehension of reliability engineering’s role within the organization.

Does the management team understand and use reliability tools to


make decisions? Do they seek out information or merely respond
to complaints. When and why does reliability related topics become
important?

Status

Within an organization, who are the leaders (independent of position)?


What combination of voices tend to drive the company? Who is held in
high esteem, rewarded, and promoted?

40
Reliability Maturity Matrix Guide

The status of the reliability practitioner may range from nonexistent,


to an obstacle, to a necessary part of doing business, to a valued team
member, or to a thought leader. Do people want to become a reliability
engineer because it’s viewed as important and career enhancing?

The status of those identified as reliability practitioners is one indicator


of the value placed and found related to reliability engineering
activities.

Measured cost of unreliability

The language of business is money.

What does the organization track and value and how is it expressed?
The actual measures, their accuracy, and their relevance to decision
making expresses the importance of product reliability within an
organization.

Prevailing sentiment

Stage 1: “We don’t know why we have problems with reliability”

41
Reliability Maturity - Understand and Improve Your Reliability Program

Stage 2: “Is it absolutely necessary to always have problems with


reliability?”

Stage 3: “Through commitment and reliability improvement we are


identifying and resolving our problems.”

Stage 4: “Failure prevention is a routine part of our operation.”

Stage 5: “We know why we do not have problems with reliability.”

2. Product Requirement

This section includes the ability to understand customer expectations,


connect specific activities to business expectations, and create a
dynamic reliability program.

Requirements and planning

Designing and producing a product that meets customer expectations


requires some level of understanding of customer expectations for
functionality, use and environmental conditions, and durability.

42
Reliability Maturity Matrix Guide

These requirements influence every facet of product design and


production.

The overall plan to achieve the reliability requirements establishes the


sequence of reliability activities and decision points over the product
life-cycle.

Training and development

The technical skills and knowledge needed to design and produce a


product span a wide range of reliability engineering activities.

Individuals across the organization need to understand the reliability-


related goals, plans, tasks, and measures and their importance to
effectively create a reliable product.

3. Engineering

This section defines the organizations ability to create and analyze the
collection of elements making up a product. The ability to understand
the interaction of materials and processes impact on reliability
performance is central to engineering process.

43
Reliability Maturity - Understand and Improve Your Reliability Program

Reliability analysis

Assessing reliability risk with a product’s design or field performance


illuminates failure modes, mechanisms, and effects.

The analysis provides information to create reliability estimates and


predictions. The ability to understand, characterize, compare, and
judge product reliability enables decisions across the product life-
cycle.

Reliability testing

The intent of physically evaluating product prototypes and production


units is to:

• identify design and supply chain weaknesses,


• explore product limits and potential failure modes,
• and determine the effects of the expected range of use profiles and
environments.

Physical testing includes demonstrating that the product’s durability


(expected reliability) meets the requirements.

44
Reliability Maturity Matrix Guide

Supply chain management

Many products consist of combination of purchased components and


materials assembled into a functional item.

The reliability performance is significantly influenced by the reliability


performance of the selected components and materials.

Reliability is only one aspect of supplier selection, and the active


involvement of reliability practitioners enables

• risk assessment,
• reliability requirements allocation,
• joint component reliability testing,
• and key vendor process control enhancements.

Furthermore, monitoring supplier impact of reliability performance,


process variation, change notices, and end of manufacture notices
enables active management of any effects on product reliability.

45
Reliability Maturity - Understand and Improve Your Reliability Program

4. Feedback Process

Henry Petroski suggests engineers design based on the knowledge of


failures. (Petroski, 2006)

An organizations ability to identify and learn from failures provides the


information needed for design improvements.

Failure data tracking and analysis

Each product failure highlights an area for product reliability


improvement.

Systemically recording, tracking, analyzing, and reporting failures


from across the product life-cycle and supply chain enable you to
acquire comprehensive and timely information. The product design
team needs to understand, prioritize, and design products to minimize
product failure.

The entire business requires timely and accurate failure data for
decisions to be made concerning, e.g., improvement projects, supplier
selection, and warranty policies.

46
Reliability Maturity Matrix Guide

Validation and verification

This check step in most organizations consists of verifying that


the reliability objectives have been met and that planned reliability
activities have occurred.

A cross-check can support individual results with consistent results


from other reliability activities. The process is often part of the overall
program management process.

Reliability improvements

During this process one tries to identify and implement product


changes that are designed to improve product reliability.

The sources for improvement projects may come from reliability


testing and analysis, product failures, customer requests, changes
in the supply chain, use, or environmental conditions, or changes in
technologies or materials.

The implementation of corrective actions includes prioritization,


validation of effectiveness, and prevention of reoccurrence of similar
failure modes or mechanisms.

47
Reliability Maturity - Understand and Improve Your Reliability Program

Next Steps

Now, let’s explore the specific recommendations to allow an


organization to move from one stage of maturity to the next.

For each stage we will focus on the four principal categories (the
leftmost column of the matrix) of management, product requirements,
engineering, and the feedback process.

These categories will be further broken down into subcategories to


better address the issues unique to each principal category

48
Moving from Stage 1 to Stage 2
The basic approach includes the awareness of the cost-of-unreliability
to all concerned, building awareness of basic reliability engineering
concepts and tools, plus encouraging the natural aversion of the risk of
failure.

The basic message is that the organization should deliberately address


reliability. There are tools available to help us understand and avoid
failures.

The remainder of the chapter provides recommendations to move an


organization out of Uncertainty to the next stage of reliability maturity,
Awareness.

Management

Understanding and attitude

• Create basic awareness that product failures occur and can be


avoided. Understand that field failures cost the company money and
cause customer dissatisfaction.

49
Reliability Maturity - Understand and Improve Your Reliability Program

• Create a basic report of the number of field failures and warranty


expenses.
• Provide training, discussion, and learning opportunities for
the management team related to basic reliability concepts and
activities. Convey that all parts of the organization contribute to the
actual product reliability.

Status

• Identify one or more reliability practitioners within the organization


to assist in product design decision making.
• Highlight individuals and the benefits of reliability-related activities.
• Promote an individual to create and manage a reliability program.
• Recognize the reliability professional’s influence on and benefit to
product design and manufacturing decisions.

Measured cost of unreliability

• Create means to collect and report basic product reliability field


performance.
• Estimate the cost of a product return.
• Estimate the warranty cost at the individual product level.

50
Moving from Stage 1 to Stage 2

• Track and report the value of reliability activities.

Product Requirements

Requirements and planning

• Publish and highlight customer requirements related to product


reliability.
• Gather and highlight information about customer use and
environmental conditions.
• Create a reliability program plan including a list of reliability
activities to accomplish.

Training and development

• Create reliability overview seminars for designers and extended


product development teams.
• Create a list of reliability training resources related to industry or
technology.
• Provide training opportunities for reliability practitioners with an
emphasis on reliability concepts and statistical methods.

51
Reliability Maturity - Understand and Improve Your Reliability Program

Engineering

Reliability analysis

• Poll design team for reliability risks. Determine what potential risks
are known.
• Create a prediction capability, for example by using a parts-count
approach or by drawing simple reliability block diagram and using
vendor data.
• Illustrate failure mode impact on the customer.

Reliability testing

• Create a minimum reliability test plan to address primary reliability


requirements.
• Create design verification testing of functional requirements for use
on all products shipped.
• Conduct discovery testing to determine the design margin (HALT).

Supply chain management

• Create approved parts and suppliers (vendors) lists (AVLs).

52
Moving from Stage 1 to Stage 2

• Create a vendor reliability assessment process for use with critical


component vendors and new suppliers.
• Use vendor data to qualify component for use within a product and
environment.

Feedback Process

Failure data tracking and analysis

• Collect and report regular factory yield and field failure data.
• Use Pareto charts to determine improvement projects.
• Conduct failure analysis and corrective actions on major failures.

Validation and verification

• Create a process for management review of reliability plan


implementation.
• Compare field reliability data to requirements and predictions.
• Create a system to validate the effectiveness of corrective actions.

53
Reliability Maturity - Understand and Improve Your Reliability Program

Reliability improvements

• Document design and process changes and their anticipated impact


on product reliability.
• Implement design and process changes to address customer
complaints and field failures.
• Review field failures for vendor connections and implement vendor
improvements or exclude poorly performing vendors from the AVL.

54
Moving from Stage 2 to Stage 3
Once an organization has awareness of the need to address reliability
they begin to look for tools to assist in addressing product reliability.
The organization needs to build experience using the range of available
tools.

The basic message is that there are many ways to address reliability.
Let’s explore the range of tools available to help us understand and
avoid failures.

The remainder of the chapter provides recommendations to move an


organization out of Awareness to the next stage of reliability maturity,
Enlightenment.

Management

Understanding and attitude

• Conduct informal training (e.g., lunch & learn) on basic reliability


topics and invite the management team to participate.
• Highlight and train members of management in their role in vendor
selection, design priorities, product testing, and failure analysis with

55
Reliability Maturity - Understand and Improve Your Reliability Program

respect to product reliability. Encourage and coach management


team members to ask customers about the importance of product
reliability.
• Provide regular summary reports on product design progress
toward reliability goals and field reliability performance.

Status

• Invite key reliability practitioners to program and division decision


meetings.
• Promote a reliability practitioner to report directly to division
management.
• Recognize the reliability professional’s influence on and benefit to
product platform decisions.

Measured cost of unreliability

• Create means to track the costs of failure analysis and re-


engineering projects.
• Estimate costs of repairs, maintenance, replacement, and
associated activities.

56
Moving from Stage 2 to Stage 3

• Create means to improve resolution (e.g., increase operating


hours, determine the root cause of failure, evaluate environmental
conditions, etc.) of product reliability field performance reports.
• Establish consistent cost calculations and reporting mechanisms
within the organization.

Product Requirements

Requirements and planning

• Create fully stated reliability requirements including function,


environment, duration, and probability of success.
• Gather and publish customer profiles including range and
distribution of environmental and use conditions.
• Apportion reliability requirements to product subsystems and
major components.
• Create a detailed reliability program plan including budgets for
resources, personal, and capital equipment.

• Evaluate designs and suppliers for new materials or processes that


may increase reliability risk.

57
Reliability Maturity - Understand and Improve Your Reliability Program

Training and development

• Create and provide regular classes for engineers on root-cause


analysis and corrective action methods.
• Create and provide regular seminars for managers on reliability
activities and on the use and value of those activities for
improvement of product reliability.

Engineering

Reliability analysis

• Lead FMEA studies with willing teams.


• Conduct field data reliability analysis to estimate reliability
performance.
• Review design changes to ascertain the broader impact on product
reliability.
• Use worst-case conditions rather than only nominal conditions.
• Use failure mechanism models to design and analyze test results.

58
Moving from Stage 2 to Stage 3

Reliability testing

• Create a detailed reliability test plan, including stresses for specific


failure mechanism, samples size calculations, and confidence
levels.
• Determine the failure mechanisms evaluated for each test proposed
and verify that all potential failure mechanisms are appropriately
exercised within the overall test program.
• Review vendor testing to determine whether it is adequately
connected to expected use and environmental conditions and
potential failure mechanisms.

Supply chain management

• Include reliability requirements in design specifications and


requests for quotes from vendors.
• Include assessment information in management of AVLs.
• Request and review field reliability performance from critical
component vendors.
• Evaluate vendor end of production or change notices on product
reliability.

59
Reliability Maturity - Understand and Improve Your Reliability Program

Feedback Process

Failure data tracking and analysis

• Collect and analyze failure data to guide component selection.


• Revise reliability test plans based in part on field failure data (i.e.,
evaluate test coverage and value in preventing field failures).
• Confirm the root cause of failures and the adequacy of product
improvement to avoid the failure or to mitigate failure effects.
• Collect and analyze time-to-failure information rather than failure
counts or percentages.

Validation and verification

• Create a process to verify that supplier corrective actions have the


expected effects on product reliability.
• Compare stress screening and ongoing reliability testing to field
failures and adjust as needed.
• Compare field failure modes to expected failure modes, and modify
risk assessment practices to minimize the differences.

60
Moving from Stage 2 to Stage 3

Reliability improvements

• Implement corrective actions to internally identified reliability


testing failures.
• Create means to track and report corrective action effectiveness.
• Create a lessons-learned process based on identified failure modes.

61
Reliability Maturity - Understand and Improve Your Reliability Program

62
Moving from Stage 3 to Stage 4
Using a wide range of reliability engineering tools has created
experience, now the team should begin selectively using the most
valuable tools for specific situations.

The basic message is that there are many ways to proactively address
reliability. We need to tailor our approach to maximize the value of
each reliability activity.

The remainder of the chapter provides recommendations to move


an organization out of Enlightenment to the next stage of reliability
maturity, Wisdom.

Management

Understanding and attitude

• Provide the management team with ‘talking points’ for key reliability
program initiatives for use with customers and internal teams.
• Provide value statements related to achievement in reliability
improvements.

63
Reliability Maturity - Understand and Improve Your Reliability Program

• Create a significant element of senior management’s bonus


structure based on product reliability performance.
• Discuss options for proactively addressing major reliability issues.
• Develop detailed reliability models that provide means to conduct
‘what if’ experiments for various reliability activities.

Status

• Invite key reliability practitioners to critical business and customer


meetings.
• Invite key managers to lead reliability programs and initiatives as
part of a steering committee.
• Invite reliability practitioners to discussions on early product
concept development and major vendor selection.
• Recognize and reward reliability improvement activities outside the
ranks of identified reliability professionals.
• Recognize the reliability professional’s contribution to prevention of
product failures.

64
Moving from Stage 3 to Stage 4

Measured cost of unreliability

• Establish means to estimate the return on investment of individual


reliability tasks.
• Create means to calculate the cost to the customer for each product
failure.
• Calculate the cost of product ownership over the entire product life-
cycle.

Product Requirements

Requirements and planning

• Express reliability objectives as distribution rather than point


estimates, when applicable.
• Incorporate reliability plans within product development plans.
• Create decision points within the reliability plan to adjust activities
based on current information.
• Review supplier and vendor reliability programs to identify potential
risk areas.
• Create an overall reliability program strategy and implementation
plan.

65
Reliability Maturity - Understand and Improve Your Reliability Program

Training and development

• Create tailored reliability courses for key reliability tasks including


when and how to determine the need to accomplish the task.
• Create and provide seminars and workshops to senior managers on
how reliability impacts the business.
• Encourage reliability practitioners to learn how to identify failure
modes and mechanisms related to the product and industry.
• Create a reliability training program for engineers and associated
managers focused on design for reliability and implementation of
critical reliability activities.

Engineering

Reliability analysis

• Use distributions rather than point estimates for reliability


predictions.
• Include confidence intervals or bounds on data analysis results.
• Use distributions for use and environmental conditions rather than
specification values.
• Use failure mechanism models to determine cost–benefit decisions
for product changes.

66
Moving from Stage 3 to Stage 4

Reliability testing

• Conduct reliability testing only when needed to resolve a question or


provide information for a decision.
• Design accelerated testing that is focused on specific failure
mechanisms.
• Expand discovery testing to include more stresses related to use
conditions and to new vendors or materials under consideration.

Supply chain management

• Create critical-to-reliability criteria for supplier process control


and/or ongoing reliability evaluations.
• Review reliability testing and failure mechanisms for those tests
best performed by vendors (upstream or at point of least value
added).
• Require vendors to evaluate the reliability programs of their
suppliers.
• Evaluate technology maturity and stability of vendor processes and
components prior to vendor selection.

67
Reliability Maturity - Understand and Improve Your Reliability Program

Feedback Process

Failure data tracking and analysis

• Conduct failure analysis to find the root cause and update design
guidelines and reliability testing to prevent future occurrences.
• Analyze failure data for systemic decision-making processes that
allowed the failure to occur.
• Create part batch, lot, or similar tracking systems.

Validation and verification

• Assess reliability activities and their effectiveness to determine


process improvements or best practices.
• Verify that risk assessments are a closed-loop process and updated
as new information becomes available.
• Compare field failure mechanisms with expected failure
mechanisms and adjust risk assessment practices and reliability
testing procedures to minimize the difference.

68
Moving from Stage 3 to Stage 4

Reliability improvements

• Create a lessons-learned process based on identified failure


mechanisms.
• Explore means to improve reliability predictions, analysis, and
testing with more effective or efficient techniques or a combination
of techniques.
• Create means to document the value of reliability activities and
publish value determination guidelines.

69
Reliability Maturity - Understand and Improve Your Reliability Program

70
Moving from Stage 4 to Stage 5
Reliability is important to the organization. The next stage is to
embed reliability thinking across the organization and at every level.
Considering reliability becomes a natural part of all decisions.

The basic message is that reliability engineering and consideration is


part of how the organization operates. We have a culture of reliability
and it is how we do business.

The remainder of the chapter provides recommendations to move an


organization out of Wisdom to the next stage of reliability maturity,
Certainty.

Management

Understanding and attitude

• Provide insights and mentoring concerning approaches to


systematically prevent product failures.
• Provide reliability reports on reliability predictions and associated
business impact to profit.

71
Reliability Maturity - Understand and Improve Your Reliability Program

• Discuss investment areas for product reliability improvements that


impact product architecture, technology, and patent and product
portfolio.

Status

• Invite key reliability practitioners to provide input to business


strategic planning.
• Recognize the reliability professional’s contribution to customer
satisfaction and brand loyalty.

Measured cost of unreliability

• Calculate the influence of product reliability improvements on


increased sales and brand loyalty (customer satisfaction or net
promoter indices).
• Calculate value of brand related to product reliability perception or
performance.

72
Moving from Stage 4 to Stage 5

Product Requirements

Requirements and planning

• Create reliability plans that include contingency plans for range of


design, supply chain, and requirements disruptions.
• Create reliability strategic plans that are integrated with overall
business strategic plans.

Training and development

• Create means to learn about industry trends, new materials and


processes, and reliability modeling and analysis tools that may have
a meaningful impact on the business.
• Create a comprehensive reliability training program for everyone in
the organization with visible management support and involvement.

Engineering

Reliability analysis

• Include life-cycle costs in analysis for use in decision making.

73
Reliability Maturity - Understand and Improve Your Reliability Program

• Create stress – life models for new materials, features, and


components when existing models are inadequate.
• Create complex simulations or Monte Carlo analysis systems to
create predictions and estimate the value of proposed changes.

Reliability testing

• Use failure mechanism models to design reliability testing, and use


test results to improve models.
• Characterize reliability of new vendor components or materials
prior to use within a product design.

Supply chain management

• Monitor for changes in product environment, use conditions,


reliability requirements, or regulatory requirements for their impact
on product reliability.
• Monitor critical-to-reliability parameters and process control
points across the supply chain to identify shifts.
• Create contingency plans for possible obsolescence or shortages of
parts.

74
Moving from Stage 4 to Stage 5

• Conduct joint studies with vendors to explore processes, materials,


and technology impact on product reliability.

Feedback Process

Failure data tracking and analysis

• Create links between customer satisfaction and product reliability.


• Create a model for determining product reliability readiness for
release based on the development of a failure reporting, analysis,
and corrective action system and other business requirements.
• Create a prognostic data collection and analysis system within
products and manufacturing equipment and processes.

Validation and verification

• Validate the use of field failure mechanisms data and analysis to


update reliability models and design guidelines.
• Create a process to verify the effectiveness of reliability strategy and
policies.

75
Reliability Maturity - Understand and Improve Your Reliability Program

Reliability improvements

• Evaluate new vendors, processes, and materials with the intent to


improve product reliability.
• Update design rules and guidelines based on product reliability
performance.

Next Steps

The last four chapters provides ideas on how to move from one stage of
maturity to the next. Now we need to ascertain the stage of maturity of
the organization.

76
How to Assess Your Reliability Program
The reliability that results is going to happen whether or not the team
designing the product or production line deliberately use reliability
engineering tools.

The elements of a product or system will respond to the environment


and will either work or fail.

While working at Hewlett-Packard I had the opportunity to conduct


a reliability program assessment of about 50 product divisions. The
assessment took one day and involved eight interviews.

“How do you know so much about our program?” was a question one
quality manager asked after reading the assessment report.

It’s all a matter of understanding the reliability decision elements and


the organization’s processes.

The key to the insights is understanding to what extent various


reliability activities take place and how they the team uses the resulting
information in decision making.

77
Reliability Maturity - Understand and Improve Your Reliability Program

It’s not just what you do, it’s how those reliability related activities
impact decision making that matters.

One hypothesis we had related to whether the number of reliability


tasks the team actively used would correlate to their warranty
expenses. That worked to a point.

The teams that did not understand basic tools and had no overt or
organized reliability engineering had high warranty expenses (as a
percent of revenue).

The teams that did a large number of tasks (FMEA, HALT, ALT,
predictions, etc.) did have lower warranty expenses.

The surprise was that the teams that had the lowest warranty expenses
also conducted very few reliability activities.

The difference was that the best performing teams understood the
range of available reliability engineering activities and only used the
tools that would provide value for a given circumstance.

Less mature organization would attempt to conduct as many


reliability-related activities, including a long list of product tests, many
of which provided little actual value.

78
How to Assess Your Reliability Program

It was the application of the right tool at the right time that made the
difference.

Maturity and Activity

Hiring a reliability engineer or running a lot of life tests does not


necessarily improve your product’s reliability performance.

It is not the organization or activities that comprise a reliability


program; rather, your reliability performance relates to the culture
concerning reliability.

Reliability occurs at the point of decision.

Therefore, during interviews the intent is to understand how decisions


are currently made. To what extent do reliability considerations
influence decisions and what tools or methods are used to form
decisions.

For example, if we ask, “To what extent do you do HALT?” The answer
may be “We rarely use HALT.”

79
Reliability Maturity - Understand and Improve Your Reliability Program

In one case, it may be that the engineer doesn’t know what HALT is and
isn’t sure whether or not the testing they conduct is similar to HALT.

They may simply be unfamiliar with that type of testing.

In another case, the engineers way that they know about HALT and
understand how and why it is used, but they have rarely used it because
they lacked appropriate situations in which HALT would be of value.

They understand that HALT is a useful tool for specific applications and
recently they have not needed to conduct HALT.

Some respond that they do HALT.

Again, there are two common responses. In one case, the team does
HALT all the time because it is required, independent of whether or not
it may be useful.

In the other case, they do HALT because it is the right tool for the
current situation.

One team didn’t know what HALT was and the other fully understood
and chose to not do HALT. The difference lies in the understanding and
application, or maturity.

80
How to Assess Your Reliability Program

Assessment Process

To understand how an organization’s reliability maturity, use the


following assessment process.

1. Select survey topics.

Create a list of activities and tools common to reliability practices in


your industry. It may include items rarely used. It should include the
breadth of topics related to reliability in your field.

See the DFR Methods Survey for one possible list of topics.

Some topics are broad, such as on ownership and responsibility of


product reliability or reactive or proactive approaches of management.

Some topics are very specific, such as specific tools such as FMEA or
HALT.

81
Reliability Maturity - Understand and Improve Your Reliability Program

2. Establish the interview format.

These can one on one, in small groups, via phone, through an invited
survey with follow-up conversations, or by some other method. I have
found the one-to-one discussions the most useful as they permit
immediate follow-up and exploration of the rationale or motivation
behind specific behaviors or responses.

3. Conduct the interviews (collect information).

Arrange to interview or survey a cross section of people in the


organization. Select individuals with experience with the organization
and products typically designed and manufactured. Useful
interviewees include the following:

• design & development engineers (electrical, mechanical, and


software),
• design & development managers (electrical, mechanical, and
software),
• reliability or quality engineers and/or managers,
• procurement engineers (i.e., those who work with suppliers), and

82
How to Assess Your Reliability Program

• manufacturing engineers and/or managers (other similar titles


include: design for manufacturing, sustaining, and/or production
engineering).

Select about eight individuals for interviews, depending on the specific


situation, size, complexity, etc. of the program.

In general, each interview question starts with the phrase, ‘to what
extent.’ For example, you might ask, “To what extent do you use HALT?”

Depending on the response you may explore the motivations or


rationale behind the decision both to conduct HALT and how the HALT
results are used within the organization.

4. Document the business environment

Include notes on sales volume, cost, brand position, revenue, cost of


unreliability as percent of net revenue, etc.

Document any regulatory or customer-imposed restrictions or


requirements. Summarize the results to convey the atmosphere
around the reliability program.

83
Reliability Maturity - Understand and Improve Your Reliability Program

5. Document the collected information

A summary given back to participants asking for additional input or


corrections helps with the acceptance of the assessment results and
may help avoid a mistake in your understanding.

6. Analyze the data.

This is not done during the interviews: Just let them do the talking.

Review the notes and information provided and map these to the
maturity matrix. Look for consistent approaches to making reliability-
related decisions. Look for patterns of behavior and underlying
motivations or causes.

7. Report on assessment findings.

Document and explain what you heard and how it related to the
overall organization’s maturity. The report may include the interview
summary, strengths, weaknesses, and recommendations for
improvement.

84
How to Assess Your Reliability Program

The assessment process should provide a view of the overall


organization’s approach to making decisions and to what extent and
how its reliability program influences those decisions.

With that basic understanding you can identify strengths to build upon,
spot weaknesses that need attention, and provide recommendations to
improve the maturity of the reliability program.

Let’s now turn to examples of specific questions that should be asked


in a program assessment.

85
Reliability Maturity - Understand and Improve Your Reliability Program

86
Sample Survey Questions & Support Material
The following sample survey was part of an online survey for a multiple
division organization. While I prefer face to face interviews this allows
the collection of suitable data quickly.

Premise

The following survey explores your view of your organization’s


(product line or division) reliability program approach. The intent is to
identify strengths and weaknesses within the organization’s reliability
program.

Every organization does have a means to accomplish product reliability


performance, and every program may reflect differences related to
local change management, customer expectations or requirements,
local practices, and management priorities.

This overall assessment will assist in the development of training and


support to effect an overall improvement in reliability engineering and
performance of fielded products.

87
Reliability Maturity - Understand and Improve Your Reliability Program

An early step in this program is to understand the current range of


reliability engineering practices. It is important to accurately reflect
your organization’s approach as it will guide the deployment of
resources to reinforce best practices and improve areas of weakness.

The survey is broken down into four areas (management, product


requirements, engineering, and feedback process) and provides an
overall snapshot of your organization’s reliability maturity or approach
to product reliability engineering practices.

For each set of statements select the one that best fits your
organization. Many of the segments have open ended questions to
promote discussion or additional insights.

Management

Management Understanding and Attitude

Which statement best reflects how your organization’s management


team approaches product reliability?

88
Sample Survey Questions & Support Material

1. There is no comprehension of reliability as a management tool.


Management tends to blame reliability engineering for ‘reliability
problems.’

2. Management recognizes that reliability management may be of


value but is not willing to provide money or time to make it happen.

3. Management is still learning more about reliability management but


is becoming supportive and helpful.

4. Management is actively participating in reliability management,


having an understanding of the absolutes of reliability management
and recognizing its role in continuing emphasis.

5. Management considers reliability management an essential part of


the company system.

What reliability metrics are in use? How are they communicated and
used within the organization?

To what extent does management own reliability (i.e., pay attention


to and follow up on reliability topics)? Is this attention proactive (i.e.,
occurring prior to field issues) or reactive (i.e., occurring only in
response to reliability problems)?

89
Reliability Maturity - Understand and Improve Your Reliability Program

Reliability Status

Which statement best reflects how your organization views product


reliability engineering?

1. Reliability is hidden in manufacturing or engineering departments.


Reliability testing is probably not part of the organization. Emphasis is
placed on initial product functionality.

2. A stronger reliability leader has been appointed, yet the main


emphasis is still on an audit of initial product functionality. Reliability
testing is still not performed.

3. The reliability manager reports to top management and has a role in


management of the division.

4. The reliability manager serves as an officer of the company,


reporting on status, being responsible for preventive action, and being
involved with consumer satisfaction and feedback.

5. The reliability manager serves on the board of directors. Prevention


of failure is the main concern. Reliability professionals are thought of
as leaders.

90
Sample Survey Questions & Support Material

Rank order the following product design priorities, from 1 for top
priority to 4 for lowest priority. Assume a particular product meets the
minimum requirements in each area already.

Product features – feature set of products or use of leading


technologies

Time to market – time to ship the product

Cost – bill of materials cost or cost of goods sold

Product reliability – field performance meets or exceeds


customer duration (life) expectations

Product Requirements

Requirements and Planning

Which statement best reflects how your organization views product


reliability requirements and planning?

1. Discussions are informal or nonexistent.

91
Reliability Maturity - Understand and Improve Your Reliability Program

2. Basic requirements based on customer requirements or standards


are considered. Plans have required activities.

3. Requirements include environment and use profiles with some


apportionment. Plans have more details with regular reviews.

4. Plans are tailored for each project and projected risks. Use is made
of distributions for environmental and use conditions.

5. Contingency planning occurs. Decisions are based on business or


market considerations. Reliability requirements and planning are part
of the strategic business plan

How are product reliability objectives stated for the product


development team? Provide an example.

Does the product development life-cycle (stage gate review process)


include reliability activities or tasks? If so, give an example.

Training and Development

Which statement best reflects how your organization views product


reliability training and development?

92
Sample Survey Questions & Support Material

1. Training is informally available to some, if requested.

2. Select individuals are trained in concepts and data analysis. Training


is available for design engineers.

3. Training for the entire engineering community is done for key


reliability-related processes. Managers receive training on reliability
and life-cycle impact.

4. Reliability and statistics courses are tailored for design and


manufacturing engineers. Senior managers are trained on reliability’s
impact on business.

5. New technologies and reliability tools are tracked and training


is adjusted to accommodate these. Reliability training is actively
supported by top management.

Which parts of the organization are expected to understand and use


reliability engineering tools and techniques? Select all that apply.

1. Design

2. Manufacturing

93
Reliability Maturity - Understand and Improve Your Reliability Program

3. Supply chain (procurement)

4. Field service and/or customer support

Engineering

Reliability Analysis

Which statement best reflects how your organization views reliability


analysis during the product design and development phase?

1. Reliability analysis is nonexistent or solely based on manufacturing


issues.

2. Analysis consists of point estimates and reliance on handbook


parts-count methods. Basic identification and listing of failure modes
and their impact is done.

3. Formal use is made of FMEA. Field data analysis of similar products


is used to adjust predictions. Design changes lead to reevaluation of
product reliability.

94
Sample Survey Questions & Support Material

4. Predictions are expressed as distributions and include confidence


limits. Environmental and use conditions are used for simulation and
testing.

5. Life-cycle cost is considered during design. Stress and damage


models are created and used. Extensive risk analysis is performed for
new technologies.

Give an example of an effective reliability risk analysis tool currently in


use. Please briefly describe.

Reliability Testing

Which statement best reflects how your organization accomplishes


reliability testing during the product design and development phase?

1. Reliability testing is primarily functional.

2. A generic test plan exists with reliability testing only to meet


customer or standards specifications.

95
Reliability Maturity - Understand and Improve Your Reliability Program

3. A detailed reliability test plan with sample size and confidence


limits is in place. Results are used for design changes and vendor
evaluations.

4. Accelerated tests and supporting models are used. Testing to failure


or destruct limits is conducted.

5. Test results are used to update component stress and damage


models. New technologies are characterized.

Is product reliability testing an integral part of the product


development process?

Does product reliability testing include discovery types such as


HALT or margin testing to uncover design weaknesses or establish
robustness?

Supply Chain Management

Which statement best reflects how your organization views supply


chain management as related to reliability?

1. Supplier selection is based on function and price.

96
Sample Survey Questions & Support Material

2. An approved vendor list is maintained. Audits are performed based


on issues or with critical parts. Qualification is primarily based on
vendor datasheets.

3. Assessments and audit results are used to update the AVL. Field
data and failure analysis related to specific vendors are used.

4. Vendor selection includes an analysis of each vendor’s reliability


data. Suppliers conduct assessments and audits of their suppliers.

5. Changes in environment, use profile, or design trigger vendor


reliability assessment. Component parameters and reliability are
monitored for stability.

Are specific reliability requirements communicated to key suppliers?

Are specific reliability tests accomplished by select vendors and


monitored by your organization?

97
Reliability Maturity - Understand and Improve Your Reliability Program

Feedback Process

Failure Data Tracking and Analysis

Which statement best reflects how your organization responds to and


addresses reliability-related failures during the entire product life-
cycle?

1. Failures during function testing may be addressed.

2. Pareto analysis of field returns and internal testing are performed.


Failure analysis relies on vendor support.

3. Root-cause analysis is used to update the AVL and prediction


models. A summary of analysis results are disseminated.

4. Focus is on failure mechanisms. Failure distribution models are


updated based on failure data.

5. The relationship between customer satisfaction and product failures


is understood. Use is made of prognostic methods to forestall failure.

98
Sample Survey Questions & Support Material

Is there a useful defect tracking system in use during product design


and development? Is the impact on product reliability included in the
prioritization?

Is a failure analysis process used for each product failure and


associated analysis?

Validation and Verification

Which statement best reflects how your organization conducts product


validation and verification as related to reliability?

1. Product validation and verification are informal and based on


individual instances rather than any process.

2. There is basic verification that plans are followed. Field failure data
are regularly reported.

3. Supplier agreements around reliability are monitored. Failure


modes are regularly monitored.

99
Reliability Maturity - Understand and Improve Your Reliability Program

4. Internal reviews of reliability processes and tools takes place. Failure


mechanisms are regularly monitored and used to update models and
test methods.

5. Reliability predictions match observed field reliability.

Which of the following tools are used for product reliability validation
and verification? Select all that apply.

1. Parts-count prediction methods

2. Testing to pre-established standards or requirements

3. Accelerated life testing for specific failure mechanisms

4. Physics of failure modeling and analysis

5. Field returns data analysis

Are field returns analyzed and results reported across the


organization?

100
Sample Survey Questions & Support Material

Reliability Improvements

Which statement best reflects your organization’s approach to


reliability improvement?

1. The process is nonexistent or informal.

2. Design and process change processes are followed. The corrective


action process includes internal and vendor engagement.

3. The effectiveness of corrective actions is tracked over time.


Identified failure modes are addressed in other product. Improvement
opportunities are identified as environment and use profiles change.

4. Identified failure mechanisms are addressed in all products.


Advanced modeling techniques are explored and adopted. A formal
and effective lessons-learned process exists.

5. New technologies are evaluated and adopted to improve reliability.


Design rules are updated based on field failure analysis.

Are vendor material or process changes evaluated for impact on


product reliability prior to using ‘new’ components?

101
Reliability Maturity - Understand and Improve Your Reliability Program

How are internal design or process changes evaluated with respect to


impact on product reliability prior to implementation

102
Following up on the Survey
Once the results of the survey have been compiled, the next phase
entails a site visit to the company by the evaluation team.

For this phase, on a mutually accepted date, an evaluation team


visits the company. Company personnel participating in this on-
site evaluation meeting should include the reliability manager and
engineers who are involved in activities such as defining reliability
requirements, reliability predictions, derating, manufacturing yields,
testing, qualification, stress analysis, failure analysis, failure tracking,
warranties, parts selection, and supplier assessment, as well as any
others who provided answers to the questionnaire.

These personnel should bring to the meeting ‘objective evidence’ in


support of their responses to the questionnaire. The evidence may
consist of data, reports, policy drafts, or current documents.

The evaluation team offers an overview of reliability capability to


provide an understanding of the rationale and the process.

After the presentation, the company provides an overview of the


business and operations at its facility, followed by its vision of
reliability.

103
Reliability Maturity - Understand and Improve Your Reliability Program

This includes, but should not be limited to, reliability objectives for
the various product categories and a description of its reliability
organization and practices.

Specifically, the presentation should include information

on the following items:

• reliability tasks performed for products,

• a list of test and failure analysis equipment,

• reliability test plan and process guidelines and/or standards,

• a list of reliability tests and some examples,

• failure analysis methods and examples,

• supplier assessment guidelines,

• part selection guidelines,

• reliability input during product development,

104
Following up on the Survey

• failure tracking strategy and examples, and

• warranty determination.

The evaluation team then assesses responses to the questionnaire and

the supporting evidence, asking follow-up questions as necessary. At

the conclusion of the meeting, the company is provided with an


informal

summary of the findings, including recommendations for corrective


actions.

Documenting the Assessment

The third and final phase involves documentation of the assessment.

The company is provided with a draft report summarizing the


evaluation team’s observations and recommendations for reliability
improvement.

105
Reliability Maturity - Understand and Improve Your Reliability Program

The company is typically given an opportunity to review the draft report


and provide comments.

A final report is then issued to the company and to the organization


that requested the assessment that highlights the areas of strengths
and weaknesses, with recommendations for improvements to
approach best-in-class standards.

The report also includes the maturity level of the company along with
an explanation of the significance of that level.

106
Book Conclusions and Summary
Reliability programs can improve. A good starting place is your
understanding of the current culture around making decisions related
to reliability.

Even a simple scan of the reliability maturity matrix may provide


an insight on the stage of maturity, which provides a basis for
improvement.

A more extensive survey with interviews or via an online survey


provides addition insights about the organization’s strengths and
weaknesses.

Interviewing even eight people from around the organization starts the
change process by bringing awareness to the current situation

You may find support and potential obstacles, and you will learn more
about how the organization actually creates the reliability performance
found in the products.

The section on recommended actions is just a starting point. Every


organization and situation is different and may require very different
approaches.

107
Reliability Maturity - Understand and Improve Your Reliability Program

Setting reliability goals and analyzing field failures might be common


across any organization, yet customer contracts, regulatory
requirements, and other external constraints may alter an
organizations path to improving reliability performance.

One of the keys to making change happen is to know where you are
going. The maturity matrix provides a glimpse of what is possible.
Change does take time.

Along the way be sure to encourage those making improvements,


illustrate the value of improved methods, and celebrate the successes.

A product’s potential reliability performance is created at the point of


decision.

These decisions occur every day across the organization. Improving


the reliability maturity of an organization enables every decision to
improve reliability performance.

108
Book Conclusions and Summary

109
Reliability Maturity - Understand and Improve Your Reliability Program

Reliability Maturity Matrix

The stages are in columns and each contains a description of an


organization for the 11 categories.

Scan across each row to find the stage the generally describes your
organization. Circle it. The various categories may have different
stages of maturity.

Generally an organization has a single stage of maturity that best


describes their reliability program.

If the matrix is not showing properly on your screen or you would like to
print out the page for local use, visit

http://www.fmsreliability.com/accendo/ebooks/reliability-maturity/

110
Reliability Maturity Matrix
Stage 1: Uncertainty Stage 2: Awakening Stage 3: Enlightenment Stage 4: Wisdom Stage 5: Certainty
Requirements Informal or Basic customer req. Requirements include Plans customized; distributions Contingency planning occurs;
& Planning nonexistent met: plans have environment & use profiles; used for environmental & use decisions based on business &
required activities plans more detailed conditions market
Training & Informally available Some training in Reliability training for Reliability & statistics courses for New technologies & reliability
Development concepts & data engineers; manager training on engineers; senior managers trained tools tracked; reliability training

Requirements
analysis reliability & lifecycle impact on impact on business supported by management
Reliability Nonexistent or based Use of point estimates Formal use of FMEA; field Predictions expressed as Lifecycle cost considered in design;
Analysis on manufacturing & hand-book parts data from similar products distributions; environmental & use stress & damage models used;
issues count; basic ID of analyzed; design changes cause conditions used for simulation & extensive risk analysis for new
failure modes & impact reevaluation testing technologies
Reliability Primarily functional Generic test plans; Detailed reliability test plans; Accelerated tests & models used; Test results used to update
Testing testing only to meet results used for design changes testing done to failure or destruct component models; new
customer or std. specs & vendor evaluation limits technologies characterized

Engineering
Supply Chain Selection based on AVL maintained; audits AVL updated by assessments & Vendor reliability data used for Changes trigger vendor reliability
Management function & price on issues or key parts; audit results; field data & failure vendor selection; suppliers conduct assessment; component
vendor datasheets used analysis related to vendors external assessments & audit parameters & reliability monitored
Failure Data Only looks at function Field returns analysis AVL & prediction models Focus on failure mechanisms; Customer satisfaction vs. product
Tracking & failures & internal testing; FA updated by root-cause analysis; failure distribution models updated failures understood; prognostic
Analysis reliant on vendor results shared via failure data methods used
Validation & Informal, without Basic verification of Supplier reliability agreements Internal reviews of reliability Reliability predictions match
Verification process plans followed; Field & failure modes regularly processes & tools, failure observed field reliability
data regularly reported monitored mechanisms monitored
Reliability Nonexistent or Design & process Effectiveness of corrective Failure mechanisms addressed in New technologies evaluated &

Feedback Process
Improvement informal change processes actions tracked; failure modes all products; modeling techniques adopted; designs updated per field
followed, corrective addressed in other products; & lessons-learned process adopted failure analysis
action taken improvements identified
Understand. & Has no grasp Recognizes but takes Becoming supportive & helpful Actively participating Considers essential to company
Attitude no action
Status No status Conduct of specific and Reliability manager reports Reliability manager is an officer, Reliability manager is a board
routine product testing to senior management & has reporting on actions & involved member; prevention is key concern
& failure analysis tasks influence in managing division with consumer affairs
Cost of Not done Direct warranty Warranty, corrective action Customer & lifecycle unreliability Lifecycle cost reduction done via

Management
Unreliability expenses only materials, & engineering costs costs identified & tracked product reliability improvements
monitored
Reliability Maturity - Understand and Improve Your Reliability Program

112
Glossary of Terms
ALT — An accelerated life testing is the evaluation of the time-to-
failure behavior for a specific failure mechanism or system using
higher than expected stress(s). The intent is to understand the
reliability performance under normal stress conditions, generally
with the use of an acceleration model.

AVL — The approved vendor list records the suppliers that have met
some set of criteria. Reliability performance may be one criteria.
Using vetted suppliers reduces the risk of vendor introduced failure
mechanisms.

Derating — Derating is a process of designing or selecting components


that have sufficient ability to withstand the various stresses
experienced during operation. Generally the operating conditions
are well below the maximum rated stress level.

FMEA — A failure mode and effect analysis is a systematic method to


identify and prevent product failures.

HALT — The highly accelerated life test is a method to discover failure


modes and mechanisms. The process generally uses multiple
stresses with increasing intensity to stimulate failures.

113
Reliability Maturity - Understand and Improve Your Reliability Program

MTBF — Mean time to failure is the inverse of the mean number of


failures in a given time period. It is commonly calculated by dividing
the total hours of operation of one or more systems by the number
of failures that occur during that time period.

114
References
Crosby, Philip B. 1979. Quality Is Free: The Art of Making Quality
Certain. New York: Signet.

IEEE Std 1624-2008. 2008. IEEE Standard for Organizational


Reliability Capability. New York: IEEE.

Petroski, Henry. 2006. Success Through Failure : The Paradox of


Design. Princeton: Princeton University Press.

Tiku, S., M. Azarian, and M. Pecht. 2007. “Using a Reliability


Capability Maturity Model to Benchmark Electronics Companies.”
International Journal of Quality & Reliability Management 24:5,
547-563.

115
Are you Ready to Accelerate
your Reliability Program and Career?

We’ve put together a comprehensive remote support and


mentoring program, which we call Reliability Coaching.

The book you’ve just read covers one element of creating an


effective reliability program or career … and that’s only the
beginning.

We’ve been working on hundreds of projects developing products,


streamlining maintenance, and improving reliability programs
for over 20 years. We’ve been fortunate to enjoy a lot of success in
that time, and it took a lot of work … and we’ve made our share of
mistakes along the way.

What if you could directly benefit from those years of experience—


and avoid those mistakes?

What if you could easily learn and apply reliability engineering best
practices, tools, and resources?

What if you could create a culture of reliability in your organization


with everyone working toward the same goals?

We’ve got something to show you. We call it Reliability Coaching,


and it’s the best way to enhance your reliability program & career.

www.fmsreliability.com/reliability-coaching/
Reliability Maturity
Understanding and Improve Your Reliability Program
Fred Schenkelberg
Fred Schenkelberg is an international authority on reliability
engineering. He is the reliability expert at FMS Reliability, a
reliability engineering and management consulting firm he founded
in 2004. Fred left Hewlett Packard (HP)’s Reliability Team where he
helped create a culture of reliability across the corporation to assist
other organizations. His passion is working with teams to improve
product reliability, customer satisfaction, and efficiencies in product
development; and to reduce product risk and warranty costs. Fred’s areas of expertise are:
reliability program development, accelerated life test design and analysis, reliability statistics,
risk assessment, test planning, and training. He has a Bachelor of Science in Physics from
the United States Military Academy and a Master of Science in Statistics from Stanford
University.

About this book


Assess your program and determine the next steps to improve your program.
• Understand what needs to change and why
• Discover proactive methods to get ahead of reliability issues
• Create a culture of reliability in your organization
• Improve you influence
This book details:
• The five stages of reliability maturity
• Assessment methods to determine your organizations maturity
• Specific recommendations to improve your program

Design : Product : Management


& Leadership : Quality Control

ebook ISBN: 978-1-938122-04-0


paperback ISBN: 978-1-938122-05-7

Vous aimerez peut-être aussi