Vous êtes sur la page 1sur 10

Expert Systems with Applications 40 (2013) 19251934

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Oil and gas pipeline failure prediction system using long range ultrasonic
transducers and Euclidean-Support Vector Machines classication approach
Lam Hong Lee a,, Rajprasad Rajkumar a,1, Lai Hung Lo b,2, Chin Heng Wan b,2, Dino Isa a,3
a
b

Intelligent Systems Research Group, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia
Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, 31900 Kampar, Perak, Malaysia

a r t i c l e

i n f o

Keywords:
Oil and gas pipeline defects
Long range ultrasonic transducer
Support Vector Machines
Euclidean distance function
Kernel function
Soft margin parameter

a b s t r a c t
This paper presents an intelligent failure prediction system for oil and gas pipeline using long range ultrasonic transducers and Euclidean-Support Vector Machines classication approach. Since the past decade,
the incidents of oil and gas pipeline leaks and failures which happened around the world are becoming
more frequent and have caused loss of life, properties and irreversible environmental damages. This situation is due to the lack of a full-proof method of inspection on the condition of oil and gas pipelines.
Onset of corrosion and other defects are undetected which cause unplanned shutdowns and disruption
of energy supplies to consumers. Existing failure prediction systems for pipeline which use non-destructive
testing (NDTs) methods are accurate, but they are deployed at pre-determined intervals which can be
several months apart. Hence, a full-proof and reliable inspection method is required to continuously
monitor the condition of oil and gas pipeline in order to provide sufcient information and time to oil
and gas operators to plan and organize shutdowns before failures occur. Permanently installed long range
ultrasonic transducers (LRUTs) offer a solution to this problem by providing an inspection platform that
continuously monitor critical pipeline sections. Data are acquired in real-time and processed to make
decision based on the condition of the pipe. The continuous nature of the data requires an automatic decision making software rather than manual inspection by operators. Support Vector Machines (SVMs) classication approach has been increasingly used in a multitude of domains including LRUT and has shown
better performance than other classication algorithms. SVM is heavily dependent on the choice of kernel
functions as well as ne tuning of the kernel and soft margin parameters. Hence it is unsuitable to be used
in continuous monitoring of pipeline data where constant modications of kernels and parameters are
not unrealistic. This paper proposes a novel classication technique, namely Euclidean-Support Vector
Machines (Euclidean-SVM), to make a decision on the integrity of the pipeline in a continuous monitoring
environment. The results show that the classication accuracy of the Euclidean-SVM approach is not
dependent on the choice of the kernel function and parameters when classifying data from pipes with
simulated defects. Irrespective of the kernel function and parameters chosen, classication accuracy of
the Euclidean-SVM is comparable and also higher in some cases than using conventional SVM. Hence,
the Euclidean-SVM approach is ideally suited for classifying data from the oil and gas pipelines which
are continuously monitored using LRUT.
2012 Elsevier Ltd. All rights reserved.

1. Introduction
This paper presents a novel oil and gas pipeline failure prediction system by utilizing non-destructive testing (NDTs) method
based on long range ultrasonic transducers (LRUTs), in conjunction

Corresponding author. Fax: +603 89248017.


E-mail addresses: leelamhong@gmail.com (L.H. Lee), Rajprasad.Rajkumar@
nottingham.edu.my (R. Rajkumar), laihung@hotmail.my (L.H. Lo), wanchinheng@
yahoo.com (C.H. Wan), Dino.Isa@nottingham.edu.my (D. Isa).
1
Tel.: +603 89248377; fax: +603 89248017.
2
Tel.: +605 4688888; fax: +605 4661672.
3
Tel.: +603 89248116; fax: +603 89248017.
0957-4174/$ - see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2012.10.006

with an advanced signal processing technique and a new classication framework, the Euclidean-Support Vector Machines (Euclidean-SVM) approach. This system provides continuous monitoring
for pipelines using NDT method, and also makes decision without
human errors and misinterpretations using articial intelligence
classication approach. In recent years, oil and gas pipeline condition monitoring and failure prediction system has become great
importance, due to the incidents of oil and gas pipeline leaks and
failures which happened around the world. These incidents are
becoming more frequent and have caused loss of life, properties
and irreversible environmental damages. The major cause of these
incidents is the lack of a full-proof method of inspection on the
condition of oil and gas pipeline. Corrosion has been reported as

1926

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

one of the major problems in oil and gas pipeline which results in
catastrophic pollution and wastage of raw materials (Lozev, Smith,
& Grimmett, 2003). These undetected defects of pipeline cause unplanned shutdowns and disruptions of energy supply to the consumers. Hence, frequent leaks of gas and oil from the pipeline
due to ruptured pipes are calling for the need for better and more
efcient methods to monitor the condition and predict the failures
of oil and gas pipeline.
Since some decades now, techniques such as pigging (Lebsack,
2002) have been used for pipeline inspection at predetermined
intervals. The pigging technique uses devices called smart pigs,
which travel within the pipeline to record critical information such
as corrosion levels, cracks, and structural defects using numerous
types of sensors. Smart pigs are able to provide pinpoint information on the location of defects using techniques such as magnetic
ux leakage and ultrasonic detection (Bickerstaff, Vaughn, Stoker,
Hassard, & Garrett, 2002). However, the implementation of pigging
system in pipeline inspection can be very costly, and the pipeline
condition is measured only at the instance it is deployed and does
not provide continuous measurements over time. Recently, other
NDT techniques have also been introduced to monitor the condition of pipelines in order to reduce the cost of using pigging system
for pipeline inspection. However, these NDT methods have also
been implemented at predetermined intervals, where operators
need to be physically present to perform measurements, data collection and make judgments on the integrity of the pipeline. These
processes may take up to several months in order to generate the
result regarding to the condition of the pipeline. During this period,
the condition of the pipeline can go unmonitored, and hence may
cause failures and leaks, as the defects which lead to these failures
may occur suddenly.
In order to overcome the problems as mentioned above, a fullproof and reliable inspection method is required to continuously
monitor the condition of oil and gas pipeline in order to provide
sufcient information in a real-time manner to oil and gas operators to plan and organize shutdowns of the pipeline before failures
occur. A permanently installed NDT system is needed for real-time
pipeline condition monitoring and failures prediction by providing
an inspection platform that continuously monitors critical pipeline
sections, such as insulated pipes, risers, pipes on hill slopes, pipe
bends, pipes under road crossings and offshore pipes. This system
would ensure that pipes are continuously monitored and hence
prevent the occurrence of leaks and failures. LRUT which utilizes
guided waves to inspect long distances from a single location
(Demma, Cawley, Lowe, Roosenbrand, & Pavlakovic, 2004), was
specically designed for inspection of Corrosion Under Insulation
(CUI). As compared to other NDT techniques, LRUT is reported to
be more efcient and cost-saving, since it also able to detect both
internal and external corrosions. This makes LRUT technique to
have many advantages over other NDT techniques which have seen
their widespread use in many other applications. With recent
developments of oil and gas pipeline which are based on permanent mounting system using special compound, a real-time continuous monitoring system is destined to be the future trend of NDT
systems. Data from the permanently installed LRUT system will be
continuous and hence not practical to be analyzed by human operators. In our proposed system, data are acquired in real-time and
processed to make decision based on the condition of the pipe.
The continuous nature of the data requires an automatic decision
making software rather than manual inspection by the human
operators. Hence, automatic and intelligence-based software must
be deployed to the system in order to process the continuous
streams of data and make decisions on the integrity of the pipeline
monitored.
Support Vector Machines (SVMs) approach has been increasingly used in a multitude of domains including LRUT and has

shown better performance than other classication algorithms


(Diederich, Kindermann, Leopold, & Paass, 2003; Isa, Lee, Kallimani,
& Rajkumar, 2008; Isa & Rajkumar, 2009; Joachims, 1998; Joachims,
1999; Joachims, 2002; Lee, Rajkumar, & Isa, 2012; Lee, Wan,
Rajkumar, & Isa, 2012; Wan, Lee, Rajkumar, & Isa, 2012). It can
be used as a discriminative classier and has shown to be more
accurate than most other classication models (Chakrabarti, Roy,
& Soundalgekar, 2003; Isa et al., 2008; Yang & Liu, 1999). The good
generalization characteristic of SVM is due to the implementation
of Structural Risk Minimization (SRM) principle, which entails
nding an optimal separating hyper-plane, thus guaranteeing the
highly accurate classier in most applications. Previous work has
shown that SVM has provided excellent generalization performance for the LRUT pipeline failure prediction system, and the
combination of discrete wavelet transform and SVM led to high
accuracies for predicting failures in pipelines (Isa & Rajkumar,
2009). However these results were generated without having the
continuous scenario in mind. The good classication performance
of SVM is only guaranteed when the classication model is implemented with the appropriate combination of kernel function and
parameters. According to the principle of SVM, one of the critical
problems of SVM classication approach is the selection of appropriate combinations of kernel function and parameters, in order to
obtain high classication accuracy. It does not have a generally
optimal combination of kernel function and parameters which is
able to guarantee maximal classication performance for all types
of data. Hence, for an online and continuous scenario, the conventional SVM is unsuitable to be used in continuous acquisition and
processing of pipeline data where frequently tunings of kernels
and parameters values are unrealistic and impractical.
In recent years, many research works have been carried out
with the same goal that is seeking for the solutions to counter
the problem of obtaining an optimal combination of kernel function and parameters for SVM. Typically, convoluted computations
such as grid search (Hsu & Lin, 2002; Staelin, 2003) and evolutionary algorithms (Avci, 2009; Briggs & Oates, 2005; Diosan, Rogozan,
& Pecuchet, 2012; Dong, Xia, & Tu, 2007; Friedrichs & Igel, 2004;
Quang, Zhang, & Li, 2002; Zhang, Shan, Duan, & Zhang, 2009;) have
been proposed for the optimization of the combination of kernel
function and parameters for the SVM models. This could be done
by conducting iterative cross-validation process to predict the best
performing combination of kernel function and parameters for the
trained SVM classier, using a validation set. This method leads to
a computationally intensive and high time-consuming training
process, hence degrades the efciency of the classier. To date,
there is no ultimate solution of having an all-rounded and optimal
combination of kernel function and parameters which will suit
most of the SVM classication tasks. Therefore, in our previous
work (Isa & Rajkumar, 2009), we found that one of the weaknesses
of the pipeline failure prediction system using the conventional
SVM approach is the necessity to identify the optimal combination
of kernel function and parameters, in order to obtain high predicting accuracy for the presence of defects in oil and gas pipeline. Furthermore, for certain cases in which the training samples are
limited, such as the proposed pipeline failure prediction system
in this paper, there exists a critical problem in real world scenario
in preparing sufcient training set and validation set to train the
classier and to conduct the kernel function and parameters optimization process. As the usage of LRUT technique for pipeline failure prediction is still in an early stage of investigation, research
and development, there is a great obstacle in obtaining wellorganized and analyzed sample data to construct sufcient
training set and validation set for the SVM model.
In this paper, we propose the Euclidean-SVM classication
framework to be used in conjunction with the NDT method based
on LRUT technique to perform continuous condition monitoring

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

and failure prediction for oil and gas pipeline. The Euclidean-SVM
approach is presented by replacing the optimal separating
hyper-plane of the conventional SVM with Euclidean distance
measurement as the classication decision making function. The
Euclidean-SVM approach uses SVM in the training phase to identify the set of support vectors (SVs) for each category, and uses
Euclidean distance formula in the classication phase to compute
the average distances between the testing data point and each of
the sets of SVs from different categories. Classication decision is
made based on the category which has the lowest average distance
between its set of SVs and the new data point, in order to make the
classication decision irrespective of the efcacy of hyper-plane
formed by applying the particular kernel function and parameters.
The conventional SVM classication model requires the implementation of the appropriate combination of kernel function and
parameters to make correct decisions. In our proposed EuclideanSVM classication approach, the impact of kernel function and
parameters on the accuracy of the classier could be minimized.
As the result, the Euclidean-SVM approach contributes to a kernel
function and parameters independent classication framework for
the real-time pipeline condition monitoring and failure prediction
system, and hence prevents the necessity of preparing validation
dataset for kernel function and parameters optimization process,
and reduces the convoluted computations in the training phase.
2. Euclidean-Support Vector Machines classication approach
Support Vector Machines (SVMs) is increasingly being used for
classication problems due to its promising empirical performance
and excellent generalization ability. The good generalization characteristic of SVM is due to the implementation of Structural Risk
Minimization (SRM) principle, which entails nding an optimal
separating hyper-plane, thus guaranteeing the highly accurate
classier in most applications. Eq. (1) represents the equation of
a hyper-plane which can be used to partition data points in SVM.

w:x b 0

Fig. 1 illustrates a linearly separable case, where data points of one


category (represented by o) and data points of another category
(represented by ) are separated by the linear optimal separating
hyper-plane (the solid straight line).
There are actually an innite number of hyper-planes that are
able to partition the data points into two categories (as illustrated
by the dashed lines on Fig. 1). According to the SVM methodology,
there will just be one optimal separating hyper-plane. This optimal
separating hyper-plane is lying half-way in between the maximal
margin, where the margin is dened as the sum of distances of

Fig. 1. Optimal separating hyper-plane.

1927

the hyper-plane to the support vectors. In the case as illustrated


in Fig. 1, the margin is d1 + d2.
The optimal separating hyper-plane is only determined by the
closest data points of each category. These points are called Support Vectors (SVs). As only the SVs determine the optimal separating hyper-plane, there is a certain way to represent them for a
given set of training points. It has been shown in (Haykin, 1999)
that the maximal margin can be found by minimizing 1/2 ||w||2,
as shown in Eq. (2).

minf1=2kwk2 g

Therefore, the optimal separating hyper-plane can be congured by


minimizing Eq. (2) under the constraint of Eq. (3), that the training
data points are correctly separated.

yi :w:xi b P 1; i

The discussion of SVM in more details has been presented in


our previous works (Isa & Rajkumar, 2009; Isa et al., 2008; Lee,
Rajkumar et al., 2012; Lee, Wan et al., 2012).
We propose and implement a new classication framework for
the pipeline failure prediction system by introducing Euclidean
distance to replace the optimal separating hyper-plane in SVM as
the classication decision making function, in order to prevent
the kernel function and parameters optimization process, and reduces the convoluted computations in the training phase. In the
Euclidean-SVM classication approach, SVM training algorithm
has been utilized to reduce the training data points by identifying
and retaining only the SVs, and eliminating the rest of the training
data points. In the classication phase, Euclidean distance function
is used to make the classication decision based on the average
distance between the testing data point to each group of SVs from
different categories. The use of optimal separating hyper-plane as
the decision surface has been discarded, as the construction of
the optimal separating hyper-plane is highly dependent on the kernel function and parameters. In fact, the construction of the linear
separating hyper-plane in the high dimensional feature space is
based on the implementation of kernel function and parameters,
in which kernel function is incorporated to map the data points
into a high dimensional feature space, so that the data points (specically the SVs) are possibly separable by a linear separating
hyper-plane. This causes the kernel functions and parameters have
high impact on the construction of separating hyper-plane, and
hence affect the classication accuracy of the SVM classier.
During the training phase of our proposed Euclidean-SVM classication framework, the conventional SVM training algorithm is
used to map all the training data points into the vector space and
identify the set of SVs for each of the categories. The construction
of the optimal separating hyper-plane is still a necessity in order to
identify the SVs, since the optimal separating hyper-plane is lying
half-way in between the maximal margin, where the margin is dened as the sum of distances of the hyper-plane to the SVs. Fig. 2
illustrates the construction of the optimal separating-hyper-plane
in the vector space which separates the training data points of
two different categories, after implementing the conventional
SVM training algorithm.
As illustrated in Fig. 2, there are two categories of training data
points, represented by spheres and squares respectively. The optimal separating hyper-plane is constructed by maximizing the margin of d1 + d2. However, this optimal separating hyper-plane is
discarded in the classication phase as it does not act as the classication decision making surface. Our proposal in this paper is
to replace the optimal separating hyper-plane by introducing
Euclidean distance function in making the decision for the classication task. After the SVs for each of the categories have been identied, they are remained in the original vector space and the rest of

1928

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

v
!
u n
u X
2
t
D
pi  qi

i1

Fig. 2. Vector space of the conventional SVM classier with optimal separating
hyper-plane.

As illustrated in Fig. 3, D1 and D2 represent the Euclidean distances


between the new data point and the SVs of category Sphere, while
D3, D4 and D5 represent the Euclidean distances between the new
data point and the SVs of category Square. After obtaining the
Euclidean distances between the new data point and each of the
SVs from different categories, the average distance of the new data
point to the set of SVs of each of the categories has been computed.
This could be done by adding up the Euclidean distances of the new
data point to the SVs from the same category, and divide the sum
with the total number of SVs for that particular category, as illustrated by Eq. (5). Based on the example as illustrated in Fig. 3, the
average distance of the new data point to the SVs of category
Sphere is (D1 + D2)/2, and the average distance of the new data
point to the SVs of category Square is (D3 + D4 + D5)/3.

PN
Dav g

Fig. 3. Vector space of the Euclidean-SVM classier with the Euclidean distance
function as the classication decision making algorithm.

the training data points are eliminated. During the classication


phase, a new unlabeled data point is mapped into the same vector
space with the sets of SVs, and the average distances between the
new data point and each set of the SVs from different categories are
computed using Euclidean distance function. Fig. 3 illustrates the
vector space of the Euclidean-SVM classier during classication
phase.
The Triangle in Fig. 3 represents the new unlabeled data point
to be classied. The distances between the new input data point
and each of the SVs are computed. Euclidean distance function is
used to calculate the distance between two points, new vector P,
and support vector Q. Eq.(4) illustrates the Euclidean distance formula which is implemented in the Euclidean-SVM classication
framework, where pi and qi are the coordinate of P or Q in dimension n.

I1

q 
P
ni1 pi  qi 2
I

After computing the average distance of the new data point to the
set of SVs of each of the categories, the classication decision is
made based on the category which has the lowest average distance
between its set of SVs and the new data point. In other words, the
new input data point will be labeled with the category which has
the lowest average distance between its SVs and the new data point
itself. Table 1 illustrates the algorithm of the Euclidean-SVM classication approach.
With the combination of the SVM training algorithm and
Euclidean distance function to make the classication decision,
the impact of kernel function and parameters on the classication
accuracy can be minimized. This is due to the fact that the optimal
separating hyper-plane, which its construction is highly dependent
on kernel function and parameters, is replaced by Euclidean distance function. Since Euclidean distance function is able to perform
its classication decision making task sufciently as long as both
the training data points (support vectors) and new data points to
be classied are mapped into the same vector space, the transformation of existing vector space into a higher dimensional feature
space by using kernel function is not needed during the classication phase, hence does not have great impact on the classication
performance. In other words, the problem of selecting the right
combination of kernel function and parameters for the classier
does not exist if the optimal separating hyper-plane is replaced
by Euclidean distance function. As proven by our experimental results obtained in this paper, the classication performance of the
Euclidean-SVM is comparable to the conventional SVM, without
needing the selection and implementation of the appropriate combination of kernel function and parameters.

Table 1
Algorithms of the Euclidean-SVM classication approach.
Stages

Steps

Training
stage

1. Map all the training data points into the vector space of a SVM.
2. Determine and capture the set of support vectors for each of the categories using SVM training algorithm, and eliminate the rest of the training data
points which are not identied as support vectors.
3. Map all the support vectors into the original vector space.

Testing
stage

1. Map the new unlabeled data point into the same original vector space with all the support vectors.
2. Adopt the Euclidean distance function to compute the average distances between the new data point and each of the sets of support vectors from
different categories.
3. Determine the category which has the lowest average distance between its set of support vectors and the new inserted data point.
4. Generate classication result for the new data point based on the identied category.

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

3. LRUT with Euclidean-SVM pipeline failure prediction


framework
Many systems are commercially available that can perform
LRUT using guided waves. These systems use sophisticated equipments and software to efciently generate and analyze ultrasonic
guided waves. Our prototype pipeline failure prediction system is
designed and developed to simulate oil and gas pipeline conditions
and failures. The LRUT system used in this paper is designed, simulated and constructed using standard laboratory equipments and
components, and the Euclidean-SVM classication framework is
implemented in conjunction with the LRUT system to perform continuous condition monitoring and failure prediction for the pipeline. Fig. 4 shows the block diagram of the Euclidean-SVM
pipeline failure prediction system which has been proposed in this
paper. The piezoelectric transducers are highly specialized and are
capable of specically exciting torsional guided waves by manipulating their orientation. Tone burst signals are used to excite the
transducers and the low bandwidth nature of these signals makes
the generation of torsional mode much easier.
Five cycle tone burst signals are created using Agilents 33220A
arbitrary waveform generator. The waveforms are created in a

1929

computer using the waveform editor software and uploaded onto


the waveform generators non-volatile memory location. The burst
frequency is chosen as 10 Hz which is the recommended maximum rating specied by the manufacturers of the transducers.
The transducers also require a high voltage excitation signal in order to create the waves of sufcient amplitude to propagate long
distances. The tone burst signals are therefore amplied to a voltage of 200 V peak-to-peak using a power amplier. 16 piezoelectric
transducers, arranged axially in a ring, are used for the LRUT
system.
The developed LRUT system was tested on a 1.5 m section of a
carbon steel pipe which is 140 mm in diameter and 5 mm in wall
thickness. The frequency of the tone burst signals required to excite the transducers for this pipe is experimentally determined as
20 kHz as this gives the highest back wall echo signal strength. Ideally the best form of corrosion simulation would be to gradually
corrode a section of the pipe over a long period of time and take
periodic measurement. However, such an experiment would require considerable effort, rst in fabricating a corrosion simulation
rig and secondly to simulate corrosion at sufciently high rates.
Hence, in order to prove the concepts in this paper, a standard corrosion defect, full circumferential defect will be performed at

Fig. 4. Block diagram of the Euclidean-SVM pipeline failure prediction system.

1930

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

Fig. 5. Full circumferential corrosion defect (i) Position (ii) Actual Picture.

Table 2
Arrangement of data points for simulation of a continuous signal.
Depth of defect (mm)

Assigned defect level

0
1
2
3
4

0
1
2
3
4

Sample number
Start

End

1
501
1001
1501
2001

500
1000
1500
2000
2500

tion rate. In conclusion, by implementing the Euclidean-SVM approach to construct a classication framework for pipeline failure
prediction, we can obtain a pipeline failure prediction system with
better performance in which the accuracy is comparable with or
even better than the conventional pipeline failure prediction system using the SVM approach, while immunes from the problem
of determining the appropriate kernel function and parameters
for the classier.
4. Experiments and evaluations

different depths. Several measurements will be taken at each depth


and all signals will be arranged sequentially in order to simulate a
natural corrosion process.
A full circumferential corrosion defect with 3 mm axial length
was created using a lathe machine. Depths of 1 mm, 2 mm, 3 mm
and 4 mm were created with LRUT system in place and measurements taken after each depth was machined. Fig. 5 shows the location of defects on the pipe section. The inatable collar used in our
LRUT experimental rig is not designed for permanent installation
hence a minor modication was performed. An airtight valve and
pressure gauge assembly was added to the air inlet section and
an air pressure of 30 psi was continuously maintained. To enable
easy interpretation, the transducer collar is placed at one end of
the pipe to limit backward wave propagation. Ideally more than
one ring of sensors would be needed to limit backward waves
and produce only unidirectional wave propagation.
Data for continuous corrosion progression over time is difcult
to achieve. Hence, data at different corrosion defect depths will be
arranged sequentially to simulate a continuous time signal. 500
guided wave measurements were taken on the original pipe and
at each defect depth. Table 2 shows how the data points are arranged. Every data point is assigned or labeled with a defect level.
For example, data points 1 to 500 are labeled with defect level 0. If
the defect levels are plotted against the sample numbers from 1 to
2500, the plot will look like staircase.
The results from the LRUT experimental rig will ultimately be
used to ascertain whether the Euclidean-SVM classier can sufciently detect the presence of defects and helps in failure prediction. By using the Euclidean-SVM classier to monitor pipelines
condition and detect pipelines failure, the accuracy of the failure
prediction system has been proven to be less sensitive to the
implementation of kernel functions and parameters, as compared
to the previous system which uses the conventional SVM. This fact
has been proven in our experimental results presented in Section 4.
In other words, the necessity of selecting the right combination of
kernel function and parameters is no more a critical problem for
the classier in order to guarantee highly accurate failure predic-

The proposed Euclidean-SVM pipeline failure prediction system


has been tested and evaluated using six datasets. One of them was
created by acquiring and grouping the original data collected from
our LRUT simulation rig separately according to ve different levels
of corrosion defect depths as illustrated in Table 2. This dataset was
then distorted with different levels of Signal-To-Noise Ratio (SNR)
to generate 5 more datasets. The list of the six datasets is illustrated in Table 3.
Both of the conventional SVM classication framework and the
Euclidean-SVM classication framework were being evaluated
independently in the experiments, with the implementation of different combinations of kernel function and parameter C values. In
our experiments, we have implemented the tested classication
approaches with four common kernel functions for SVM, namely
linear kernel, polynomial kernel, radial basis function (RBF) kernel
and sigmoid kernel. As for the soft margin parameter, C, the range
of values of 1, 101, 102, 103, 104 and 105 has been applied to both of
the tested classiers. By conducting the experiments on these two
classication frameworks separately with different kernel functions and different values of parameter C, we are able to evaluate
the performance of each approach and to determine the improvement of the Euclidean-SVM approach (if any) in contrast to the
conventional SVM model in terms of classication accuracy. Besides this, we are also able to evaluate the impact of the implementation of different kernel functions and parameter C, on the
conventional SVM approach, as well as the Euclidean-SVM
approach.

Table 3
List of datasets used in the experiments.
1.
2.
3.
4.
5.
6.

Original Dataset (without noise)


Dataset with 5 dB SNR
Dataset with 2 dB SNR
Dataset with 0.01 dB SNR
Dataset with 1 dB SNR
Dataset with 10 dB SNR

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

1931

Table 4
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on the original dataset (0 dB of
SNR).
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

99.30
99.25
22.10
28.75
92.30
97.30
98.95
99.10

99.30
99.25
23.70
28.85
92.30
97.30
98.95
99.10

99.30
99.25
23.27
29.60
92.30
97.30
98.95
99.10

99.30
99.25
23.70
29.50
92.30
97.30
98.95
99.10

99.30
99.25
23.70
30.60
92.30
97.30
98.95
99.10

0
0
0.4802
0.5493
0
0
0
0

Table 5
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on dataset with 5 dB SNR.
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

44.00
48.05
20.00
35.10
50.70
51.65
51.55
51.40

44.50
48.05
20.00
37.50
50.55
51.65
51.55
50.60

44.35
48.05
20.00
36.25
50.65
51.65
51.55
50.75

44.30
48.05
20.00
36.15
50.65
51.65
51.55
51.15

44.55
48.05
20.00
37.00
50.65
51.65
51.55
50.70

The conventional SVM classication approach is implemented


by using MATLAB version 7.11.0.584 (R2010b) together with LIBSVM library version 2.91. As for the Euclidean-SVM classication
approach, the same version of MATLAB and LIBSVM library were
used in order to identify the SVs for each category using SVM training algorithm. An additional module is developed in MATLAB script
which acts as a function to compute the Euclidean distance of the
new input data point to the set of SVs for each category. In the end
of this module, classication decision is made by identifying the
category which has the lowest average distance between the new
input data point with its set of SVs.
In our experiments, we also compute the variance of accuracies
across the tested values of soft margin parameter, C, in order to
analyze the dependency of the classication performance on
parameter C. The variance of accuracies across the tested values
of parameter C is computed using the formula as illustrated below.

s2

Pn

 X2
n  1

i1 X i

Variance is almost identical to standard deviation and it is used to


measure the spread of data contained in a data set. It is noticed that
the variance (s2) is simply the squared of standard deviation which
is normally represented by Eq. (6). The variance of accuracies across
the tested values of parameter C is computed by using a MATLAB
command as shown below:

Variance v arx:
4.1. Experiment on Original Dataset (0 dB SNR)
As illustrated in Table 4, the implementation of different types
of kernel functions greatly affect the performance of the conventional SVM classier on the Original Dataset. The SVM classier
with linear kernel and the SVM classier with polynomial kernel
have been recorded with high accuracies, which are 99.30% and

0.0467
0
0
0.8362
0.0030
0
0
0.1157

99.25% respectively. However, the conventional SVM classier performs badly with RBF kernel and sigmoid kernel implemented to it.
In this experiment, the SVM classier with RBF kernel has achieved
its best accuracy of 23.70% which the highest accuracy of the SVM
classier with sigmoid kernel has been recorded at 30.60%. The
great difference in terms of the accuracy of the SVM classier with
different kernel functions has shown that the conventional SVM
classier requires the implementation of appropriate kernel
function in order to obtain high classication performance and to
guarantee good generalization ability. The inappropriate implementation of kernel function in the conventional SVM leads to a
low performance of the classier. The experimental results here
show that the accuracy of the conventional SVM classier is highly
dependent on the implementation of kernel function.
The experiments on the Euclidean-SVM classier have recorded
a more consistent performance with different kernel functions
implemented to it, as compared to the conventional SVM. The lowest classication accuracy for the Euclidean-SVM is recorded at
92.30% when the classier is implemented with linear kernel function. The Euli-SVM classier with sigmoid kernel has achieved the
highest accuracy of 99.10% in this experiment. Hence, it can be
seen that the implementation of kernel function has minimal impact on the performance of the Euclidean-SVM classier.
In this experiment, the performance of both of the conventional
SVM classier and the Euclidean-SVM classier are almost immune from the soft margin parameter, C. According to Table 4,
the variances of classication accuracies across values of parameter C, for the SVM classier with RBF kernel and the SVM classier
with sigmoid kernel, have been recorded at 0.4820 and 0.5493
respectively. On the other hand, the soft margin parameter does
not have impact on the classication performance of the Euclidean-SVM classiers. These results have again proven that the
Euclidean-SVM approach has lower dependent on the implementation of kernel function and soft margin parameter, as compared to
the conventional SVM approach.

1932

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

Table 6
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on dataset with 2 dB SNR.
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

59.40
60.10
20.00
37.70
62.35
66.45
66.95
65.55

59.50
60.10
20.00
47.55
61.90
66.45
66.95
66.05

59.45
60.10
20.00
36.60
61.55
66.45
66.95
65.95

59.50
60.10
20.00
36.95
61.55
66.45
66.95
66.00

59.50
60.10
20.00
36.90
62.05
66.45
66.95
66.10

0.0020
0
0
22.2667
0.1170
0
0
0.0482

Table 7
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on dataset with 0.01 dB SNR.
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

65.00
68.90
20.00
29.00
71.85
73.55
74.65
74.70

63.75
68.90
20.00
27.65
72.85
73.55
75.65
74.65

63.65
68.90
20.00
27.55
72.60
73.55
74.65
75.10

63.50
68.90
20.00
28.55
72.55
73.55
74.65
75.15

63.50
68.90
20.00
27.65
72.55
73.55
74.65
75.15

4.2. Experiment on Dataset with 5 dB SNR


Table 5 shows the experimental results of the conventional SVM
classier and the Euclidean-SVM classier with the implementations of different combinations of types of kernel function and values for parameter C, on the dataset with 5 dB SNR. In general,
both of the conventional SVM and the Euclidean-SVM performed
poorly in this experiment. This is due to the fact that the level of
SNR is low. In other words, this dataset is highly distorted, hence
degrade the quality of the original data.
As we can observe from Table 5, both kernel types and parameter C has very low impact on the classication performance of the
Euclidean-SVM approach. With all the tested combinations of kernel function and parameter C, the Euclidean-SVM approach still
manage to obtain the classication accuracies within the range
from 50.55% to 51.65%. On the other hand, for the conventional
SVM approach, even the varying of parameter C value does not
has great impact on the classication performance, kernel function
is still the main factor in guaranteeing the classier to achieve high
accuracy. The best accuracy of the conventional SVM approach was
achieved by the classier with polynomial kernel, which has been
recorded at 48.05%, while the lowest accuracy of 20% was achieved
by the SVM classier with RBF kernel. Based on the results obtained from this experiment with a distorted dataset, the Euclidean-SVM approach has outperformed the conventional SVM
approach, in terms of classication accuracy as well as the performance consistency across different implementations of kernel
functions and values of the soft margin parameter.
4.3. Experiment on Dataset with 2 dB SNR
The experimental results illustrated in Table 6 show that the
performance of the conventional SVM classication approach
highly dependent on the type of kernel function implemented to
the classier. When the right kernel functions, in this case, linear
kernel and polynomial kernel are implemented to the classier,
accuracy of approximately 60% can be achieved. On the other hand,

0.4033
0
0
0.4295
0.1395
0
0
0.0637

when the inappropriate kernel functions (RBF kernel and sigmoid


kernel) are used, the classication accuracy drops drastically to
47.55% (SVM classier with sigmoid kernel and C = 10), and even
lower to 20% when RBF kernel is implemented to the classier.
In this experiment, parameter C has great impact on the SVM classier with sigmoid kernel, in which the variance of accuracies
across the tested values of parameter C has been recorded at
22.2667.
According to the results in Table 6, most of the Euclidean-SVM
classiers implemented with different types of kernel function and
values of parameter C have achieved the accuracies of approximately 66%. Based on the experimental results, it shows that in
handling the classication task on the dataset with 2 dB SNR,
the Euclidean-SVM approach has achieved better classication
accuracy and also performance consistency, in contrast to the performance of the conventional SVM approach.
4.4. Experiment on Dataset with 0.01 dB SNR
Table 7 illustrates the experimental results of the tested classiers on the dataset with 0.01 dB SNR. Again in this experiment, the
Euclidean-SVM approach outperforms the conventional SVM approach, in terms of classication accuracy and consistency across
different implementations of kernel types and values of soft margin parameter. The Euclidean-SVM classiers have generally
achieved good classication performance with the accuracies recorded at the range from 71.85% to 75.65%, with all the tested combinations of different types of kernel and parameter C values. Both
of the kernel function and soft margin parameter have low impact
on the classication accuracy of the Euclidean-SVM approach.
On the other hand, the conventional SVM approach can only
achieve good classication performance with the implementation
of polynomial kernel, in which the accuracies are recorded at
68.90%. The accuracies of the SVM classiers with linear kernel
are recorded at the range from 63.50% to 65.00%, across different
values of parameter C. However, the SVM classiers with RBF kernel and the SVM classiers with sigmoid kernel have achieved poor

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

1933

Table 8
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on dataset with 1 dB SNR.
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

75.05
72.95
20.00
28.70
77.95
75.60
75.40
77.10

72.85
72.95
20.00
29.75
75.45
75.60
75.40
76.65

72.65
72.95
20.00
29.80
75.30
75.60
75.40
76.65

72.80
72.95
20.00
29.75
75.30
75.60
75.40
76.65

72.80
72.95
20.00
29.80
75.30
75.60
75.40
76.65

1.0407
0
0
0.2318
1.3693
0
0
0.0405

Table 9
Classication accuracies of SVM classier and Euclidean-SVM classier with different types of kernel function and different values for parameter C on dataset with 10 dB SNR.
Classication approach (Kernel function)

Classication accuracy (%)

Variance of accuracies across values of parameter C

Value of soft margin parameter, C

SVM (Linear)
SVM (Polynomial)
SVM (RBF)
SVM (Sigmoid)
Euclidean-SVM (Linear)
Euclidean-SVM (Polynomial)
Euclidean-SVM (RBF)
Euclidean-SVM (Sigmoid)

10

100

1000

10000

98.25
98.55
20.80
23.10
95.60
97.55
96.25
98.00

98.25
98.55
22.80
24.10
95.60
97.55
96.25
98.40

98.25
98.55
22.80
23.90
95.60
97.55
96.25
98.40

98.25
98.55
22.80
24.00
95.60
97.55
96.25
98.40

98.25
98.55
22.80
23.75
95.60
97.55
96.25
98.40

performance with accuracies recorded at the range from 20.00% to


29.00%.
4.5. Experiment on Dataset with 1 dB SNR
The experimental results as illustrated in Table 8 again show
that kernel function has high impact on the classication
performance of the conventional SVM approach, while the Euclidean-SVM approach does not suffer from this problem. In this
experiment, the SVM classier with linear kernel and the SVM
classier with polynomial kernel have achieved the accuracies
between the range from 72.65% to 75.05%. However, the SVM classier with RBF kernel and the SVM classier with sigmoid kernel
have suffered from the problem of the inappropriate implementation of kernel function, with the relatively low accuracies recorded
at the range from 20.00% to 29.80%.
The Euclidean-SVM classiers have generally achieved relatively good classication performance as compared to the conventional SVM classiers in this experiment, with the accuracies
recorded at the range from 75.30% to 77.95%, with all the tested
combinations of different types of kernel and parameter C values.
The Euclidean-SVM approach has again outperformed the conventional SVM approach in this experiment on the dataset with 1 dB
SNR, in terms of the consistency of the classication performance,
across the implementations of different kernel functions.
4.6. Experiment on Dataset with 10 dB SNR
Table 9 again illustrates the inconsistency of classication accuracy for the conventional SVM approach with different kernel functions. Based on Tables 49, we can observe that the conventional
SVM approach has achieved high classication accuracies with
the implementation of linear kernel and polynomial kernel, which
are approximately 98%. However, the inappropriate implementation of kernel has severely degraded the performance of the conventional SVM classiers. In this experiment, for the SVM
classiers with RBF kernel and the SVM classiers with sigmoid

0
0
0.8000
0.1570
0
0
0
0.0320

kernel, the accuracies are only recorded at the range from 20.80%
to 24.10%.
On the other hand, the implementation of different kernel functions and different value of parameter C does not has high impact
on the Euclidean-SVM classication approach. The Euclidean-SVM
approach has achieved classication accuracies between the range
of 95.60% to 98.40%, with the implementation of different kernels
and different values of parameter C. The results in this experiment
have further justied that the Euclidean-SVM has lower dependency on the implementation of kernel functions and value of
parameter C, as compared to the conventional SVM.
4.7. Discussion on experimental results
Based on the results obtained from a series of experiments by
using the datasets with different SNR levels, it can be observed that
the Euclidean-SVM classication approach has lower dependency
on types of kernel and values of soft margin parameter, as compared to the conventional SVM classication approach. In most
cases, the Euclidean-SVM approach outperforms the conventional
SVM approach in terms of classication accuracy, as well as performance consistency across different combinations of kernel and
parameter C. By performing the classication tasks using the
Euclidean-SVM approach, high accuracies could be obtained, without needing the transformation of the original vector space into a
high dimensional feature space using kernel functions. This is
due to the fact that the Euclidean-SVM approach uses Euclidean
distance as the decision making function for the classication
framework. As the Euclidean-SVM approach does not use optimal
separating hyper-plane as the decision surface, the implementation of kernel functions to transform original input space into high
dimensional feature space have only minimal impact to the performance of the Euclidean-SVM classication framework. The Euclidean distance function used in the Euclidean-SVM approach could
perform effective classication decision making task, as long as
all the training data points (the SVs) and the input data points
are mapped into the same vector space. The Euclidean-SVM

1934

L.H. Lee et al. / Expert Systems with Applications 40 (2013) 19251934

approach also has less dependency on the soft margin parameter,


as compared to the conventional SVM approach. In most of the
experiments carried out in this paper, the variances of classication accuracies across different values of soft margin parameter
for the Euclidean-SVM approach are lower as compared to the conventional SVM approach.
5. Conclusion
An intelligent oil and gas pipeline failure prediction system
using LRUT and the Euclidean-SVM classication approach has
been proposed in this paper. Unlike the conventional NDT pipeline
failure prediction systems which are deployed at pre-determined
intervals, this system provides continuous monitoring for pipeline
conditions and makes decision automatically without human errors and misinterpretations. As compared to our previous work
which uses the conventional SVM approach, the pipeline failure
prediction system which uses LRUT in conjunction with the Euclidean-SVM resulted in good classication performance, without
needing the implementation of the appropriate combination of
kernel function and parameters. The experimental results show
that the Euclidean-SVM approach has less dependency to the type
of kernel function and the value of soft margin parameter, as compared to the conventional SVM. This characteristic of the Euclidean-SVM approach reiterates that this classication framework is
suitable to be used as the automatic and intelligent decision making module for the proposed oil and gas pipeline condition monitoring and failure prediction system in this paper. By having the
Euclidean-SVM classication framework in conjunction with the
NDT method based on LRUT technique to perform real-time and
continuous condition monitoring and failure prediction for oil
and gas pipeline, the processes for kernel function and soft margin
parameter optimization could be avoided. These processes have
been reported as impractical and non-realistic in real world implementation of the system, due to the iterative and convoluted computations and the requirement of a validation set. As for future
works, we will further investigate on the alternative distance and
similarity measurement functions to replace the Euclidean distance function, which may contribute to more accurate distance
or similarity measurement for the SVs and the input data point,
hence lead to a more effective and efcient SVM-based classication framework for oil and gas pipeline condition monitoring and
failure prediction system.
References
Avci, E. (2009). Selecting of the optimal feature subset and kernel parameters in
digital modulation classication by using hybrid genetic algorithm-support
vector machines: HGASVM. Expert Systems with Applications, 36(2), 13911402.
Bickerstaff, R., Vaughn, M., Stoker, G., Hassard, M., & Garrett, M. (2002). Review of
sensor technologies for in-line inspection of natural gas pipelines. Report:
Sandia National Laboratories, Albuquerque, New Mexico, USA.

Briggs, T. & Oates, T. (2005). Discovering domain-specic composite kernels. In


Proceedings of the 20th national conference of articial intelligence (AAAI Press-05)
(pp. 732738).
Chakrabarti, S., Roy, S., & Soundalgekar, M. V. (2003). Fast and accurate text
classication via multiple linear discriminant projection. The International
Journal on Very Large Data Bases (VLDB), 12(2), 170185.
Demma, A., Cawley, P., Lowe, M., Roosenbrand, A. G., & Pavlakovic, B. (2004).
The reection of guided waves from notches in pipes: A guide for interpreting
corrosion measurements. NDT & E International Elsevier.. Vol 37(3) pp.167
180.
Diederich, J., Kindermann, J., Leopold, E., & Paass, G. (2003). Authorship attribution
with support vector machines. Applied Intelligence, 19(12), 109123.
Diosan, L., Rogozan, A., & Pecuchet, J. P. (2012). Improving classication
performance of support vector machine by genetically optimising kernel
shape and hyper-parameters. Applied Intelligence, 36(2), 280294.
Dong, Y., Xia, Z., & Tu, M. (2007). Selecting optimal parameters in support vector
machines. In Proceedings of the IEEE 6th international conference on machine
learning and applications (ICMLA07).
Friedrichs, F. & Igel, C. (2004). Evolutionary tuning of multiple SVM parameters. In
Proceedings of European symposium on articial, neural networks (ESANN2004)
(pp. 519524).
Haykin, S. (1999). Neural network, a comprehensive foundation (2nd ed.). Prentice
Hall.
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector
machines. IEEE Transaction on Neural Networks, 13(2), 415425.
Isa, D., Lee, L. H., Kallimani, V. P., & Rajkumar, R. (2008). Text document preprocessing with the Bayes formula for classication using the support vector
machine. IEEE Transactions on Knowledge and Data Engineering, 20(9),
12641272.
Isa, D., & Rajkumar, R. (2009). Pipeline defect prediction using support vector
machines. Applied Articial Intelligence, 23(8), 758771.
Joachims, T. (1998). Text categorization with support vector machines: Learning
with many relevant features. In Proceedings of the 10th European conference on,
machine learning (ECML-98) (pp. 137142).
Joachims, T. (1999). Making large-scale SVM learning practical. Advances in Kernel
Methods Support Vector Learning, 169184.
Joachims, T. (2002). Learning to classify text using support vector machines. Kluwer
Academic Publishers.
Lebsack, S. (2002). Non-invasive inspection method for unpiggable pipeline
sections. Pipeline and Gas Journal, 229(6), 5864.
Lee, L. H., Rajkumar, R., & Isa, D. (2012). Automatic folder allocation system using
Bayesian-support vector machines hybrid classication approach. Applied
Intelligence, 36(2), 295307.
Lee, L. H., Wan, C. H., Rajkumar, R., & Isa, D. (2012). An enhanced support vector
machine classication framework by using Euclidean distance function for text
document categorization. Applied Intelligence, 37(1), 8099.
Lozev, M. G., Smith, R. W., & Grimmett, B. B. (2003). Evaluation of methods for
detecting and monitoring of corrosion damage in risers. In Proceedings of ASME
2003 22nd International Conference on Offshore Mechanics and Arctic, Engineering
(OMAE2003) (pp. 363374).
Quang, A. T., Zhang, Q. L., & Li, X. (2002). Evolving support vector machine
parameters. In Proceedings of the 1st international conference on machine learning
and, cybernetics (pp. 548551).
Staelin, C. (2003). Parameter selection for support vector machines. Technical
Report HPL-2002354R1, Hewlett Packard Laboratories.
Wan, C. H., Lee, L. H., Rajkumar, R., & Isa, D. (2012). A hybrid text classication
approach with low dependency on parameter by integrating K-nearest neighbor
and support vector machine. Expert Systems with Applications, 39(15),
1188011888.
Yang, Y. M. & Liu, X. (1999). A re-examination of text categorization methods. In
Proceedings of the 22nd annual international ACM SIGIR conference on research and
development in, information retrieval (SIGIR99) (pp. 4249).
Zhang, Q., Shan,G., Duan, X., & Zhang, Z. (2009). Parameters optimization of support
vector machine based on simulated annealing and genetic algorithm. In
Proceedings of the IEEE international conference on robotics and biomimetics,
(pp. 13021306).

Vous aimerez peut-être aussi