Role of IT in Event Data Collectin For Reliability Analysis PDF

Role of IT in Event Data
Collection for Reliability

Analysis

Role of IT in Event Data Collection for Reliability Analysis

Copyright Meridium, Inc. 2004
All rights reserved. Printed in the U.S.A.

All rights including reproduction by photographic or electronic process and translation into other
languages of this material are fully reserved under copyright laws. Reproduction or use of this material in
whole or in part in any manner without written permission from Meridium, Inc. is strictly prohibited.

Meridium is a registered trademark of Meridium, Inc. All trade names referenced are the service mark,
trademark, or registered trademark of the respective manufacturer.
MERIDIUM | Role of IT in Event Data Collection for Reliability Analysis 1

Introduction

Many companies today are striving to become more competitive by optimizing costs and increasing
productivity through the improvement of work processes. This corporate streamlining is changing the way
that people work, how they communicate, how decisions are made and examining the daily activities of
workers are scheduled. Improvements over the years in information technology have revolutionized both
worker productivity and maximum machinery up-time. The role of information technology is critical for this
maintenance optimization revolution because it allows decision makers a common framework for fact-
based analysis aligned to corporate goals and strategies.

Another key aspect of maintenance optimization is uniformity in the content and methodology of decision
making. This uniformity can only occur when information technologies are used efficiently and effectively
for condition monitoring, reliability analysis, and prognostic evaluation of data. With increased software
capability and information availability, new and more efficient ways of querying data for analysis become
increasingly important. Lack of predictability negatively affects company performance because equipment
failures directly affect production costs. Production costs are, in many industries, the dominant cost
category that controls a companys competitive position in the marketplace. Managements ability to
control these costs relies on accurate data that is properly interpreted and acted upon.

Many companies have found that the barrier to this analysis is accessing the data already collected, and
combining it with data collected from other parts of the organization. Capacity utilization improvements
come from a deeper understanding of the causes and impact of unreliability. Maintenance optimization is
an evaluation process that examines current functions, tasks, and activities to achieve the proper
investment balance between production goals, cost optimization and safety/risk. In doing so
manufacturers must become experts in predicting and preventing failures. This can only be achieved
through a fundamental understanding of the predominant failure modes of the equipment. By
understanding the failure mode and looking at the probability of failure for a particular sub-component the
best judgment is made regarding the appropriate long-term and short-term corrective actions. The term
optimization implies a single point or goal of maximum plant production capacity at minimum cost. The
goal of maintenance optimization would be to achieve the highest level of reliability for the least
investment in parts and labor. By leveraging against the investment in information technology, visibility
into the bad actors is increased and predictable production begins to take shape.

To attain continuous improvement status, information systems need to provide sufficient analysis
capability. However before this analysis can take place there has to be sufficient access to data
necessary to understand equipment reliability and develop initial asset management strategies.

Business Challenge

Collecting event data is a minimum requirement to measure the effectiveness of your current asset
strategies. Without this information, those on the front line tend to allocate resources that focus on the
problem of the day and as a result, do not have a systematic approach to removing failures from the
production lines of their manufacturing environments. Having a complete set of event data for all
equipment allows a much better defined view of how operators allocate human and capital resources to
help operators break out of the reactive work cycle. What emerges is a systematic approach to
eliminating failure that, over time, dramatically increases productivity and profitability.

Event data can come from a variety of sources such as a maintenance management system, predictive
and inspection systems, as well as production systems. For this discussion, we will focus our attention
primarily on collecting event data related to equipment that resides in a maintenance management
system.

It is important to understand the reasoning behind the data collection effort before getting into the details
of how it is actually accomplished. The collection of event data has a double benefit. The primary benefit
of comprehensive event data is to alert process owners as to whether their asset strategies are effective.
Once we identify ineffective strategies, we can use the same event data to drill down and determine what
might be the cause(s) of the ineffective strategies.

Collecting Reliability Event Data to Predict and Prevent
Failures

When an event occurs on a piece of equipment, it is critical to record what type of event actually occurred.
For instance, was it a failure, repair or a PM? What was the condition of the equipment at the time of the
event? Once the work is completed, we need to record the technical finding such as the failed item,
failure mode, cause and several other data elements that will be discussed in further detail. Some of the
most critical information on the recording of any event is the date and time stamps related to the event
and the costs associated with that event (e.g. labor, material, contractor, production losses).


Different types of reliability event data:

Work events that occur on equipment
Type of work performed
Conditions found at the time of work
Technical findings after work is completed
Dates/time associated with the work
Cost associated with performing the work

Below is a list of the data and supplemental descriptions that are recommended to collect on a given
event. This data will be used as the basis for compiling a balanced scorecard as well the information
required to find the underlying causes. A companys balanced scorecard is comprised of standardized,
enterprisewide performance measurements related to production assets. A balanced scorecard provides
a holistic view of key performance indicators (KPIs) spanning multiple plants and allows management to
make strategic, fact-based decisions with greater confidence.

Identification History Dates Consequence
Event ID Functional Loss Event Date Maintenance Cost
Event Type Functional Failure (ISO Failure Mode) Mechanically Unavailable Date/Time Production Cost
CMMS ID Effect Mechanically Available Date/Time
Functional Location Maintainable Item Mechanical Downtime
Functional Location Hierarchy Condition Maintenance Start Date/Time
Level 1 (Site) Cause Maintenance End Date/Time
Level 2 (Area) Maintenance Action Time to Repair
Level 3 (Unit) Narrative
Level
Level n (System)
Equipment ID
Equipment Name
Equipment Category (Rotating)
Equipment Class (Pump)
Equipment Type (Centrifugal)

Event ID - This is the unique identifier for each failure event.

CMMS ID This is useful if you are using a CMMS system as the base data collection system for
failure events.

Functional Location - The functional location is typically a "smart" ID that represents what function
takes place at a given location. (Pump 01-G-0001 must move liquid X from point A to point B)

Functional Location Hierarchy - Functional hierarchy to roll up metrics at various levels
Level 1
Level 2

Level 3
Level
Level n (System)

Equipment ID - The Equipment ID is usually a randomly generated ID that reflects the asset that
is in service at the functional location. The reason for a separate Equipment ID and Functional
Location is that assets can move from place to place and functional locations

Equipment Name - Name or description of Equipment for Identification purposes

Equipment Category (e.g. Rotating) - Indicates the category of equipment the work was
performed on. Generally by discipline (Rotating, Fixed, Electrical, Instrument)

Equipment Class (e.g. Pump) - Indicates the class of equipment the work was performed on.
Failure Codes can be dependent on this value

Equipment Type (e.g. Centrifugal) - Indicates the type of equipment the work was performed on.
Failure Codes can be dependent on this value

Functional Loss - This indicates whether the equipment experienced a functional loss as part of
this event. A functional loss can be defined as any of the following three types: (1) Complete
Loss of Function, (2) Partial Loss of Function, (3) Potential Loss of Function

Functional Failure (ISO Failure Mode) - Basically the symptoms of a failure if one has occurred.
Any physical asset is installed to fulfill a number of functions. The functional failure describes
which function the asset no longer is able to fulfill.

Effect - The effect of the event on production, safety environmental, or quality.

Maintainable Item - This is the actual component that was identified as causing the asset to lose it
ability to serve. (e.g. bearing)

Condition - This indicates the type of damage found to the maintainable item, in some cases this
also tends to indicate failure mechanism as well.

Cause - The general cause of the condition, this is not the root cause. It is recommended to use
RCFA to assess root causes.

Maintenance Action - Corrective action performed to mitigate the damaged item.

Narrative - Long text description of work and suggestions for improvements.
Event Date - This is the date that the event was first observed and documented.

Mechanically Unavailable Date/Time - This is the date/time that the equipment was actually taken
out of service either due to a failure or to the repair work.

Mechanically Available Date/Time - This is the date/time that the equipment was available for
service after the repair work had been completed.

Mechanical Downtime - Difference between Mechanically Unavailability Date and Mechanically
Available Date (in hours).

Maintenance Start Date/Time - This is the date/time that the equipment was actually being
worked on by maintenance.

Maintenance End Date/Time - This is the date/time that the equipment was actually finished being
worked on by maintenance.

Time to Repair - This is the total maintenance time to repair the equipment.

Maintenance Cost - This is the total maintenance expenditure to rectify the failure. This could be
company or contractor cost. This cost could be broken out into categories such as Material,
Labor, Contractor, etc.

Production Cost - This is the amount of business loss associated with not having the assets in
service. This cost includes Lost Opportunity, when an asset fails to perform its intended function
and there is no spare asset or capability to make up the loss.

Utilizing Effective Event Recording Codes

Having a work process to collect event information is only the first step in gathering accurate event
history. Without a standardized list of codes to use in your event recording, it will be almost impossible to
use for analysis. There are various resources for event recording codes that range from company specific

codes to international industry standards, including the one provided by ISO 14224. This is a standard
that was developed for the oil and gas industry and was based on work done by the Offshore Reliability
Data group OREDA.

This standard focuses on equipment as well as failure and maintenance data. It describes details related
to equipment classes, types and boundaries. With respect to event recording, this standard defines
codes, time stamps and remarks.

ISO 14224 covers a subset of equipment classes within the oil and gas industry, which are provided in the
table below.

Combustion Engine Heat Exchanger
Compressor Process Sensor
Control Logic Unit Pump
Electric Generator Turboexpander
Electric Motor Valve
Fire and Gas Detector
Gas Turbine
Vessel

Within these classes of equipment, there are specific codes that can be utilized to record equipment
events:

Method of detection
Functional loss
Failure mode
Maintainable item
Failure cause
Maintenance activity

While these codes and equipment classes are an excellent start, there are additional equipment classes
and code categories that are useful in fully documenting an equipment event. Therefore, additional
equipment classes are offered below as supplements to the ISO 14224 standard.

Agitator Boiler
Fan-Blower Fired Heater
Gas Turbine General Equipment

Meter Instrumentation
Piping NPV (Tank)
Relief Device
Steam Turbine
Power Distribution

Similarly, additional code categories are offered to supplement the code categories within ISO 14224:

Condition
Effect

Below are example Activity codes derived from ISO 14224:

Code Description
ADJ Adjust
CHK Check
CMB Combination
INS Inspection
MOD Modify
OTH Other
OVH Overhaul
REP Repair
RFT Refit
RPL Replace
SVC Service
TST Test

Uses of Reliability Analysis for Maintenance Optimization

Reliability analysis conducted on the equipment failure data results in calculated values that are used to
characterize plant equipment reliability. These values are used in many ways to improve and optimize
asset performance:


Mean Time Between Failures (MTBF) - MTBF Analysis gives users information about the typical
life of the machinery in the population which is compared with manufacturers expected values,
other plants or even benchmark values from other companies. Reliability allows users to model
changes in MTBF through the calculation of growth. Growth modeling also allows for the
prediction of future failure, thus allowing users to set an interval for failure prevention and
intervention.

Weibull Analysis - Weibull parameters gives clues as to the type of failure type (wear-in, wear-
out, end-of life, random failures) and also gives indication of mixed mode populations so that
analysts can isolate different causes of failure. By isolating the individual causes, individual
solutions can be implemented.

Root Cause Analysis Reliability analysis of individual failure modes gives evidence to support
the identification of root cause. Parts that have distinctive wear out failure modes have a
different root cause than parts that exhibit infant mortality. As the root causes of failure are
identified using reliability methods and as problems are corrected, the value of MTBF increases
over time.

Identification of Vibration Related Failures Failures caused by excessive vibration are
identified through the use of lognormal distribution analysis. Lognormal analysis is a good fit for
stress induced failures where the fault mechanism increases as the severity of vibration
increases.

Identification of Machine Design Problems Queries of failure modes by equipment types
lead to the identification of commonly failed components among a population of similar
equipment.

Identification of System Design Problems Sometimes the wrong piece of equipment is used
in the design of plant system and frequent failures of this equipment occur as a result. Failures of
similar systems can be subjected to the same analysis procedures that are conducted at the
asset level. Problem systems are identified by low values for MTBF and compared with other
similar systems.

Identification of Equipment Material Problems In some cases, the reliability analysis points
to a deficiency in materials or in material selection. These problems often behave in an early
wearout failure mode, which is easily identified with a Weibull analysis.


Identification of Construction Problems Problems sometimes occur during a start-up (after a
repair period, turnaround or outage) and are often related to the repair activities. These problems
occur as a result of inadequate or improper construction techniques and material failures. An
example of this type of problem is an improperly poured foundation that prevents proper operation
of a machine or system. These problems sometimes show up in the Weibull analysis as infant
mortality failures, with low values for MTBF.

Identification of Unsatisfactory Maintenance Procedures Like construction problems,
inadequate or unsatisfactory maintenance procedures are identified and separated by comparing
similar components between systems maintained by different crews. The level of training,
adherence to standard procedure and attention to detail all play a role in the quality of repairs
provided by the maintenance crew.

Identification of Improper Operating Procedures - Wide temperature swings and inadequate
level control leads to reduced equipment life. Failures caused by inadequate operating
procedures manifest themselves as premature wear-out modes easily identified through a
Weibull Analysis.

Inadequate Preventive Maintenance Activities Maintenance preventable failures are
identified through sorting of work order backlogs and analyzing of spare parts usage. While usage
of spare parts does not ensure their correct installation, inadequate PM activities shows up in a
reliability analysis as uncharacteristically low values for MTBF for equipment of this type, as
compared with manufacturers or industry standards.

Inadequate Inspection Routines unexpected equipment failures cause serious environmental
and safety issues. Ruptured pressure vessels, leakage and fugitive emissions caused by cracks,
weld failures and seal failures cause components to fail unexpectedly. Understanding the
reliability behavior of equipment prone to these kinds of faults allows users to schedule
inspections at appropriate intervals.

PM Optimization Weibull analysis is used to estimate the optimum time for preventive
maintenance procedures based upon a ratio of cost functions associated with planned repairs
and unplanned failures. Future failure probability can also be estimated from the reliability data.

By combining design, construction, engineering, operation, maintenance and inspection data into a single
asset management system and applying statistical analysis tools to failure data, problems that relate to
technical as well as procedural issues are addressed. The reliability of individual plant components is

only improved once current levels of reliability are identified and tracked. An Asset Performance
Management system, like Meridium, makes this task manageable.

Conclusion
A key element of a successful asset performance management process is the collection of event data
required for analysis. This is especially true if you consider that without event data it is impossible to
determine where your problems reside, what strategies are effective or ineffective and where we need to
focus our resources for the largest improvements. Beyond the ability to measure performance it gives us
the baseline data to perform detailed reliability analysis. These techniques are very powerful when
coupled with accurate and complete event data and can drive proactive behavior within the organization.
The combination of quality event data, comprehensive analysis and disciplined follow-through can be the
catalysts to meeting your corporations strategic goals.

Corporate Offices
10 South Jefferson Street
Roanoke, Virginia 24011
540.344.9205
540.345.7083 fax

Regional Offices
11200 Richmond Avenue
Suite 670
Houston, Texas 77082
281.920.9616
281.920.9190 fax

www.meridium.com
info@meridium.com

Role of IT in Event Data Collectin For Reliability Analysis PDF

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Role of IT in Event Data Collectin For Reliability Analysis PDF

Transféré par

Droits d'auteur :

Formats disponibles

Role of IT in Event Data

Collection for Reliability

Vous aimerez peut-être aussi