Vous êtes sur la page 1sur 5

Predictive Maintenance by using R Statistical Language for Predictive Analytics

Posted on 26. Kolovoz 2013.


Introduction
Predictive Analytics consists of the data processing techniques focusing in solving the problem of predicting
future outcome based on analyzing previous collected data.
Organizations are increasingly adopting predictive analytics, and adopting these predictive analytics more
broadly. Many are now using dozens or even thousands of predictive analytic models. These models are
increasingly used in realtime decision ma!ing and in operational, production systems. Models are used to
improve customer treatment by selecting the ne"t best action to develop a customer, to ma!e loan or credit
pricing decisions that reflect the future ris! of a transaction, to predict the li!elihood of equipment failure to drive
proactive maintenance decisions, or to detect potentially fraudulent transactions so they can be routed out of the
system before they hit the bottom line. #"amples li!e these deliver high $O% from analytics.
Predictive maintenance and quality
Predictive maintenance is &ust what its name implies' Maintaining components or assets, large or small, according
to factbased e"pectations for when they will fail or require service.
These facts can include'
$ealtime device status' (ow is the part performing right now)
(istorical device data' (ow has the part performed in the past)
*ata for similar devices' (ow have other, similar parts performed)
Maintenance records' +hen was the part last serviced or replaced)
Maintenance schedules' +hat does the manufacturer recommend)
Of course, all of this Big Data is meaningless without analysis. There are hidden patterns lurking
within these facts and figures. Decoding these patterns is what powers predictive maintenance
and separates it from more traditional, reactionary approaches to equipment repair and
replacement.
Traditional approaches to maintenance
Predictive maintenance differs considerably from the traditional approaches to determining when to service or
replace equipment. ,or years, companies have !ept their production lines running through a combination of these
maintenance methods'
eactive' -ervice or replace equipment after it fails
Preventive' -ervice or replace equipment according to the manufacturer.s suggested schedule, or the amount
of time it has been in service, or based on operational observations
!ondition"#ased' -ervice or replace equipment based on monitoring performed to assess its current
condition
The problem with these oldschool approaches is their high cost. +aiting until a component fails means lost
production time and revenue. %nperson inspections are e"pensive and can lead to replacing parts unnecessarily,
based only on the inspector.s best guess. ,ollowing the manufacturer.s recommended maintenance schedule saves
on inspection costs but often results in replacing parts that are still functioning well and could continue to do so.
Tmilinovic's Blog
One solution to decrease the operational cost and to increase the manufacturing system availability /,igure 01 is to
manage continuously all maintenance activities and to control the degradation to move to predictive maintenance.
Figure 1 : Decreasing of failure rate through predictive maintenance
*evice events are supplied to the solution either in real time or as a batch and are transformed into the format
required by the solution. The information in the events is recorded in the analytic data store along with aggregated
!ey performance indicators /2P%s1 and profiles. The 2P%s are accumulated over time and show trends. The
profiles indicate the current state of the device and can include statistical calculations of variation. ,or e"ample,
events containing the temperature and operating load of the transformer can be aggregated as a 2P% of the
average temperature and load per day. The operating load can also be aggregated as a profile to record the most
recent load and the variability of the load over time.
The information in the analytic data store is used to perform predictive scoring, a process that uses a
mathematical model to put a numerical value on the li!elihood that a device or component failure will occur. +e
than use a predefined set of rules to determine the appropriate actions to ta!e in response to various scores. ,or
e"ample, if a score indicates that the probability of a transformer failure is less than 3.4 /4351, the rules may call
for no immediate action. %f the score rises above 3.6 /6351, the rules may trigger a request to have a physical
inspection performed.
-coring is a !ey part of predictive maintenance and involves the use of predictive models that use historical data
to determine the probability of certain future outcomes. ,or e"ample, a model could be created based on historical
data regarding transformer temperature, current load, and occurrences of failure.
$tatistical language for Predictive %nalytics
7aptured data is continuously scored using predictive analytics software. Predictive analytic models mine the
data and correlate past failures using multivariate analysis. The models can mine all the variables and conditions
that contributed to past failures in order to predict future failures. %ncoming data are then run through the model
and asset health scores are generated on a real time basis.
The processing cycle typically involves two phases of processing'
0. Training phase' 8earn a model from training data
9. Predicting phase' *eploy the model to production and use that to predict the un!nown or future outcome
$ is an open source language:environment that is governed by ;P89.
Predictive Analytics is tightly integrated with the algorithms and statistical libraries available in $. Oracle has it.s
own version of $ called Oracle $ #nterprise for better customization to analytics using Oracle databases. -A-
%nstitute had made connectivity from -A-:%M8 and <MP products some time bac!. %=M >s acquired analytics
software -P-- had been one of the first softwares to wor! with $.
$ provides a wide variety of statistical /linear and nonlinear modelling, classical statistical tests, timeseries
analysis, classification, clustering, ?1 and graphical techniques, and is highly e"tensible.
@sing $ for predictive analytics is a lowcost and fle"ible solution, but does require a basic !nowledge of statistics
and mathematics. $ is a very powerful language for a number of reasons. (owever, the main feature is vector
processing Athe ability to perform operations on entire arrays of numbers without e"plicitly writing iteration
loops. Another important feature is that statisticians, engineers and scientists can improve the software.s code or
write variations for specific tas!s. Pac!ages written for $ add advanced algorithms, colored and te"tured graphs
and mining techniques to dig deeper into databases.
!hoose the right model
An important phase to consider is the actual analysis phase. +hen choosing a certain type of modeling, it is !ey to
consider whether the main tas! is to provide a result that is as significant as possible or one that also needs to be
presented to business users or engineers. %n many cases, decision trees have proven to be a very good approach for
classification analysis cases. %n particular, the option to build the trees manually and, in so doing, being able to
include domain !nowhow, is very powerful and well received by many customers. Also ma!e sure that you create
a holdout sample of your data to test the models on their stability and predictive power on un!nown data.
Otherwise you might end up creating a result that wor!s for the current data but will not wor! in other data in the
future. This phenomenon is called overfitting.
,our predictive models are used'
(ealth -core /binary logit model1
8ifespan Analysis /7o" $egression model1
$andom ,orests 7A$T /7lassification and $egression Trees1
Time series models
&ealth $core '#inary logit model(
The (ealth -core model is based on the linear regression model and measures the li!elihood that an asset or process will fail. The
model uses historical defect data, operational information, and environmental data to determine the current operational health of an
asset, and continuously monitors the asset to predict potential faults in real time. The resulting health score value, typically referred
to simply as the (ealth -core, can also be used to predict the future health of the asset.
The (ealth -core is presented as a number between 3 and 0. The higher the number, the healthier the asset. The
overall (ealth -core for an entire manufacturing site represents the average score for each individual asset at a
site. %f the input data model structure is modified, the health score model must be retrained on the new data.
A wellestablished statistical method for predicting binomial outcomes is required to predict the health score
value, and the solution uses a binomial logistic algorithm for this purpose. %n the binomial or binary logistic
regression, the outcome can have only two possible types of values /e.g. BCesD or BEoD, B-uccessD or B,ailureD1.
Multinomial logistic refers to cases where the outcome can have three or more possible types of values /e.g.,
BgoodD vs. Bvery goodD vs. BbestD 1. ;enerally outcome is coded as B3F and B0F in binary logistic regression. This
!ind of algorithm is limited to models where the target field is a flag or binary field. The algorithm provides
enhanced statistical output when compared to a multinomial algorithm and is less susceptible to problems when
the number of table cells /unique combinations of predictor values1 is large relative to the number of records.
$ provides comprehensive support for multiple linear regression.
To fit logistic regression model, glm/1 function is used in $ which is similar to lm/1 or Blinear modelD function, but
glm/1 includes additional parameters. The format is
glm(Y~X1+X2+X3, famil!"inomial(lin#!$logit$%, data!mdata%
(ere, C is dependent variable and G0, G9 and GH are independent variables.
)ifespan %nalysis '!o* egression model(
The 8ifespan Analysis model estimates a device.s remaining lifespan when functioning in a realworld scenario. *epending on the
device, lifespan can be measured in hours, miles, stress cycles, or any other metric. *ata on the functional condition of the device is
collected from laboratory e"periments.
The 8ifespan Analysis model analyzes timetofailure event data. 8ifespan analysis is an offline, bac!end process
and can be performed at regular intervals or on demand.
The model is based on the !o* regression model. %n many cases where the time to a certain event /such as a
failure1 can be predicted, the 7o" $egression technique is particularly wellsuited. This technique was originally
developed to predict the life e"pectancy of cancer patents but it can be used &ust as well for technical analysis. 7o"
$egression can also ta!e potential influence factors into account and finetune its failure estimates accordingly.
The shape of the survival function and the regression coefficients for the predictors are estimated from observed
sub&ectsI the model can then be applied to new cases that have measurements for the predictor variables.
,or 7o" $egression analysis we can use $ pac!age named survival /http'::cran.r
pro&ect.org:web:pac!ages:survival:1
andom +orests !%T '!lassification and egression Trees(
7A$T models offer an intuitive overview of a multivariate data set and are suitable for dealing with comple" processes and nonlinear
relationships. They are also able to recognize the parameters that are most important to a given regression problem.
(owever, they suffer from high prediction variance. Therefore, for prediction purposes we use a method that
utilizes an ensemble of 7A$T models called andom +orests. The aggregation of a large number of different
single models usually offers improved prediction accuracy. Aggregating the results of single tree models reduces
variance and produces more stable models. ,urthermore the method does not overfit due to the law of large
numbers. $egression tree model is constructed by using binary recursive partitioning routines as implemented in
the $ pac!age rpart /http'::cran.rpro&ect.org:web:pac!ages
:rpart:inde".html1 and plotted using routines from the $ pac!age partykit /http'::cran.rpro&ect.org:
web:pac!ages:party!it:inde".html1.
The methodology allows a transition from a timebased to a conditionbased maintenance, a reduction of problem
comple"ity and it offers high predictive performance. As the $andom ,orest approach is free of parametric or
distributional assumptions, the method can be applied to a wide range of predictive maintenance problems. This
leads to a reduction of tool downtime, maintenance and manpower costs and improves competitiveness in the
semiconductor industry.
Time series models
Time series models are used for predicting or forecasting the future behavior of variables. These models account for the fact that data
points ta!en over time may have an internal structure /such as autocorrelation, trend or seasonal variation1 that should be accounted
for. As a result standard regression techniques cannot be applied to time series data and methodology has been developed to
decompose the trend, seasonal and cyclical component of the series. Modeling the dynamic path of a variable can improve forecasts
since the predictable component of the series can be pro&ected into the future.
Time series models estimate difference equations containing stochastic components. Two commonly used forms
of these models are autoregressive models /A$1 and moving average /MA1 models. The =o"<en!ins methodology
/0J4K1 developed by ;eorge =o" and ;.M. <en!ins combines the A$ and MA models to produce the A$MA
/autoregressive moving average1 model which is the cornerstone of stationary time series analysis. A$%MA
/autoregressive integrated moving average models1 on the other hand are used to describe nonstationary time
series.
%n recent years time series models have become more sophisticated and attempt to model conditional
heteros!edasticity with models such as A$7( /autoregressive conditional heteros!edasticity1 and ;A$7(
/generalized autoregressive conditional heteros!edasticity1 models frequently used for financial time series. %n
addition time series models are also used to understand interrelationships among economic variables
represented by systems of equations using LA$ /vector autoregression1 and structural LA$ models.
+e are using $ forecast pac!age /http'::cran.rpro&ect.org:web:pac!ages:forecast:1.
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.
Svia mi se
Budite prvi kome se ovo svia.

elated
Operationalizing Analytics -inclair MG -pectrum turns H3 Predictive Analytics and $, Part 0