Académique Documents
Professionnel Documents
Culture Documents
net/publication/265124971
CITATIONS READS
5 597
2 authors:
Some of the authors of this publication are also working on these related projects:
Design and Development of Optical Device for Blind People View project
All content following this page was uploaded by Alaa Elsayad on 30 April 2016.
Abstract: Problem statement: Over the last two decades, Fault Diagnosis (FD) has a major
importance to enhance the quality of manufacturing and to lessen the cost of product testing. Actually,
quick and correct FD system helps to keep away from product quality problems and facilitates
precautionary maintenance. FD may be considered as a pattern recognition problem. It has been
gaining more and more attention to develop methods for improving the accuracy and efficiency of
pattern recognition. Many computational tools and algorithms that have been recently developed could
be used. Approach: This study evaluates the performances of three of the popular and effective data
mining models to diagnose seven commonly occurring faults of the steel plate namely; Pastry,
Z_Scratch, K_Scatch, Stains, Dirtiness, Bumps and Other_Faults. The models include C5.0 decision
tree (C5.0 DT) with boosting, Multi Perception Neural Network (MLPNN) with pruning and Logistic
Regression (LR) with step forward. The steel plates fault dataset investigated in this study is taken
from the University of California at Irvine (UCI) machine learning repository. Results: Given a
training set of such patterns, the individual model learned how to differentiate a new case in the
domain. The diagnosis performances of the proposed models are presented using statistical accuracy,
specificity and sensitivity. The diagnostic accuracy of the C5.0 decision tree with boosting algorithm
has achieved a remarkable performance with 97.25 and 98.09% accuracy on training and test subset.
C5.0 has outperformed the other two models. Conclusion: Experimental results showed that data
mining algorithms in general and decision trees in particular have the great impact of on the problem
of steel plates fault diagnosis.
Key words: Fault Diagnosis (FD), University of California at Irvine (UCI), Logistic Regression (LR),
Multi Perception Neural Network (MLPNN), Artificial Neural Networks (ANNs)
• Or else, select the splitting attribute x that is the in the usual way. Then, the second model is built in
most important in separating the training samples such a way that it focuses on the samples that were
into different classes. This attribute x becomes a misclassified by the first model. Then the third model is
decision node built to focus on the second model's errors and so on
• A branch is created for each individual value of x (Nisbet et al., 2009). When a new sample is to be
and the samples are partitioned accordingly classified, each model votes for its predicted class and
• The process is iterated recursively until a certain the votes are counted to determine the final class.
value of specified stopping criterion is achieved Winnowing method investigates the usefulness of
predictive attributes before starting to build the model
Different DT models use different splitting (Lau et al., 2010). This ability to pick and choose
algorithms that maximize the purity of the resulting among the predictive attributes is an important
classes of data samples. Popular DT models include advantage of tree-based modeling techniques.
ID3, C4.5 (Quinlan, 1986; 1993), CART (Breiman, Winnowing method preselect a subset of the attributes
1984), QUEST (Loh and Shih, 1997), CHAID (Berry that will be used to construct the tree. Attributes that are
and Linoff, 1997) and C5.0. Common splitting irrelevant are excluded from the tree-building process.
algorithms include Entropy based information gain In case of the current steel plate dataset, only 13
(used in ID3, C4.5, C5.0), Gini index (used in CART) attributes have been selected to build the tree. Pruning
and Chi-squared test (used in CHAID). This study uses is the last method used to increase the performance of
the C5.0 DT algorithm which is an improved version of the C5.0 DT model here. It consists of two steps; pre-
C4.5 and ID3 algorithms. It is a commercial product pruning and post-pruning (Eslamloueyan, 2011). Pre-
designed by Rule Quest Research Ltd Pty to analyze pruning step allows only nodes with minimum number
huge datasets and is implemented in SPSS Clementine of samples (node size). Post-pruning step reduces the
workbench data mining software. C5.0 uses information tree size based on the estimated classification errors.
gain as a measure of purity, which is based on the
Multilayer perceptron neural network: Artificial
notion of entropy as in Eq. 1 and 2. If the training
Neural Networks (ANNs) are normally known as
subset consists of n samples (x1,y1),…,(xn,yn), xi∈Rp is biologically motivated and highly sophisticated
the independent attributes of the sample i and yi is a analytical techniques. They are capable of modelling
predefined class Y={c1,c2,…,ck}. Then the entropy, extremely complex non-linear functions. Formally
entropy(X), of the set X relative to this n-wise defined, ANNs are analytic techniques modelled after
classification is defined as: the processes of learning in the cognitive system and
the neurological functions of the brain and capable of
n
predicting new patterns (on specific attributes) from
entropy(X) = (∑ −p i log 2 p i ) (1)
i=0
other patterns (on the same or other attributes) after
executing a process of so-called learning from existing
where, pi is the ratio of X fitting in class ci. data (Haykin, 2009). Multilayer Perceptron Neural
Information gain, gain(X, A) is simply the Network (MLPNN) with back-propagation is the most
popular ANN architecture. MLPNN is known to be a
expected reduction in entropy caused by partitioning
powerful function approximator for prediction and
the set of samples, X, based on an attribute A:
classification problems. MLPNN’s structure is
organized into layers of neurons input, output and
Xv
gain ( X,A ) = entropy ( X ) − ∑ entropy(X v ) (2) hidden layers. There is at least one hidden layer, where
v∈values(A) X the actual computations of the network are processed.
Each neuron in the hidden layer sums its input
attributes xi after multiplying them by the strengths of
where, values(A) is the set of all possible values of
the respective connection weights wij and computes its
attribute A and Xv is the subset of X for which attribute A output yj using Activation Function (AF) of this sum.
has the attribute value v, i.e., Xv = {x ∈ X | A(x) = v}. AF may range from a simple threshold function, or a
Boosting, winnowing and pruning are three sigmoidal, hyperbolic tangent, or radial basis function
methods used in the C5.0 tree construction; they Eq. 3:
propose to build the tree with the right size (Berry and
Linoff, 1997). They increase the generalization and
yi = f (∑ w ijx i ) (3)
reduce the over fitting of the DT model. Boosting is a
method for combining classifiers; it works by building
multiple models in a sequence. The first model is built where, f is the activation function
508
J. Computer Sci., 8 (4): 506-514, 2012
1
P(y = 1) = (5)
∑ i = 1 Bi x i )
n
− (B0 +
1+ e
Logistic regression: Logistic Regression (LR) is a Sensitivity refers to the rate of correctly classified
generalization of linear regression (Maalouf, 2011). It is positive and is equal to TP divided by the sum of TP and
a nonlinear regression technique for prediction of FN. Sensitivity may be referred as a True Positive Rate:
dichotomous (binary) class attribute in terms of the
predictive ones. The class attribute represent the status TP
Sensitivity = (8)
of the consumer (creditworthy, y = 1 or Not TP + FN
509
J. Computer Sci., 8 (4): 506-514, 2012
Sensitivity in Equation 8 measures the proportion Fig. 2: Stream of fault diagnosis using three
of actual positives which are correctly identified as such classification models: DT, MLPNN and LR
while specificity in Eq. 9 measures the proportion of
negatives which are correctly identified. Finally, When no more terms can be added, or the best
accuracy in Eq. 7 is the proportion of true results, either candidate term does not produce a large-enough
true positive or true negative, in a population. It improvement in the model, the final model is generated.
measures the degree of veracity of a diagnostic test on a Two min and twenty four seconds are required to build
condition. Fig. 2 demonstrates the component nodes of this model for the steel plate’s faults dataset. However,
the proposed stream. This stream is implemented in LR system came out the second best classifier with
SPSS Clementine data mining workbench using Intel core
classification accuracies of 73.26% of training and
2 Dup CPU with 2.1 GHz. Clementine uses client/server
72.59% of the test samples. C5.0 node is a boosted
architecture to distribute requests for resource-intensive
operations to powerful server software, resulting in faster decision tree model with C5.0 algorithm which is
performance on larger datasets. The software offers many trained using pruning and winnowing methods to
modeling techniques, such as prediction, classification, increase the model accuracy. The number of trails of
segmentation and association detection algorithms. boosting algorithm is 10, the minimum number of
Faults dataset node is connected directly to an samples per node is set to be 2 and the system uses
Excel file that contains the source data. The dataset equal misclassification costs. The high speed property
includes 1941 instances, which have been labeled by is a notable feature of C5.0 DT model; it clearly uses a
one of the prescribed 7 fault types: Pastry, “Z Scratch”, special technique, although this has not been described
“K Scatch”, Stains, Dirtiness, Bumps and “Other in the open literature. The classification accuracies
Faults”. Each instance of the dataset owns 27 without boosting algorithms are 90.57% of training
independent variables and one fault type. The dataset is subset and 90.57% for test one while these accuracies
explored for incorrect, inconsistent or missing data. with boosting algorithms are 97.25 and 98.09%.
These predictive attributes are of mixed numeric types; Boosting with 10 trails enhances the accuracies of the
flag or range, while the target class is of type nominal. tree to reach higher accuracies for training and test
Type node specifies the field metadata and properties samples. The time required to build single C5.0 tree is
that are important for modeling and other works in below one sec. While boosting tree requires 11 seconds
Clementine. These properties include specifying a with ten trails. This model is the best one among the
usage type, setting options for handling missing values, probability estimation classifiers. MLPNN classifier
as well as setting the role of an attribute for modeling node is trained using the pruning method (Thimm et al.,
purposes; input or output. The first 27 attributes in 1996). It begins with a large network and removes the
Table 1 are defined as input (predictive) attributes and
weakest neurons in the hidden and input layers as
the fault's type is defined as target class. Partition node
training proceeds. The stopping criterion is set based on
is used to generate a partition field that splits the dataset
into separate subsets for the training and test the models time. The network is given two min for training. Using
by the ratio of 70:30% respectively. The training subset the steel plate’s faults dataset, the MLPNN with pruning
is used to estimate the model parameters, while the test method has achieved 74.79 and 79.14% classification
one is used to independently assess the individual accuracies for training and test datasets. The resulting
model. These models are applied again to the entire structure consists of four layers; one input, two hidden
dataset and to any new data. Logistic node is trained layers and the output one with 6, 20, 15 and 7 neurons
using forward method where the model is built by respectively. Filter, Analysis and Evaluation nodes are
moving forward step by step. With this method, the used to select and rename the classifier outputs in order to
initial model is the simplest model and only the compute the performance statistical measures and to graph
constant and terms can be added to the model as in Eq. the evaluation charts.
6. At each step, attributes not yet in the model are tested The steel plate's faults dataset is divided for
based on how much they would improve the model; and training the models and test them by the ratio of
the best of those attributes is added to the model. 70:30% respectively.
510
J. Computer Sci., 8 (4): 506-514, 2012
Table 2: Confusion matrices of individual models and their ensemble for test subset (997 data samples)
Output results
------------------------------------------------------------------------------------------------------------------------
Classifiers Desired results Pastry Z_Scratch K_Scatch Stains Dirtiness Bumps Other_Faults
C5.0 Pastry 79 0 0 0 0 0 3
Z_Scratch 0 89 0 0 0 0 5
K_Scatch 0 0 191 0 0 0 1
Stains 0 0 0 47 0 0 0
Dirtiness 0 0 0 0 22 0 1
Bumps 1 0 0 1 1 203 4
Other_Faults 1 0 0 0 0 1 347
MLPNN Pastry 34 4 0 0 2 18 24
Z_Scratch 0 84 0 0 0 1 9
K_Scatch 0 0 181 0 0 1 10
Stains 0 0 0 43 0 2 2
Dirtiness 0 0 0 0 16 4 3
Bumps 3 4 1 0 4 156 42
Other_Faults 12 14 2 0 2 44 275
LOG Pastry 53 11 0 3 4 4 7
Z_Scratch 0 85 1 0 1 2 5
K_Scatch 1 1 173 7 1 4 5
Stains 0 0 0 46 0 1 0
Dirtiness 1 0 0 0 19 2 1
Bumps 13 26 1 12 8 112 38
Other_Faults 38 40 6 43 24 58 140
The training set is used to estimate the model computed and presented in Table 3 and 4. Sensitivity
parameters, while the test set is used to independently and Specificity approximate the probability of the
assess the individual model. These models are applied positive and negative labels being true. They assess the
again to the entire dataset and to any new data. The usefulness of the algorithm on a single model. Using
time required to build each model with the dataset is the results shown in Table 2, it can be seen that the
variable; ranging from few seconds up to two min for sensitivity, specificity and classification accuracy of
the neural network. In C5.0 DT model, boosting can C5.0 DT model has achieved the best success of test
significantly improve the accuracy of model, but it also samples classification.
requires longer training. It works by building multiple The gain chart provides a visual summary of the
models in a sequence. Cases are classified by applying usefulness of the information provided by the
the whole set of models to them and using a voting classification models for predicting categorical dependent
procedure to combine the separate predictions into one attributes. The chart summarizes the utility that can be
overall prediction. The predictions of all models are expected by using the respective predictive models, as
compared to the original classes to identify the values of compared to the baseline information only. The gain charts
true positives, true negatives, false positives and false for Decision tree with C5.0, Logistic regression with
negative. These values have been computed to construct forward method and neural network models trained in
the confusion matrix as shown in Table 2. The values SPSS Clementine for training and test subsets are shown
of the statistical measures (sensitivity, specificity and in Fig. 3. The higher lines in the gain charts indicate better
total classification accuracy) of the three models were models, especially on the left side of the chart.
511
J. Computer Sci., 8 (4): 506-514, 2012
DISCUSSION
datasets could be consisted of a large volume of Bewick, V., L. Cheek and J. Ball, 2005. Statistics
heterogeneous data fields which usually complicates the review 14: Logistic regression. Crit Care 9: 112-
use data mining techniques. The main criticism about 118. DOI: 10.1186/cc3045
data mining techniques is that they are not following all Breiman, L., 1984. Classification and Regression Trees.
of the requirements of classical statistics. They use 1st Edn., Wadsworth International Group,
training and testing data sets drawn from the same
Belmont, ISBN-10: 0534980538 pp: 358.
dataset. In classical statistics, it can be argued that the
testing set used in this case is not truly independent and Buscema, M., S. Terzi and W. Tastle, 2010. A new
for that reason, the results may be biased. meta-classifier. Proceedings of the North American
Despite these criticisms, data mining can be an Fuzzy Inform Processing Society, Jul. 12-14, IEEE
important tool in the fault diagnosis problem by Xplore Press, Toronto, pp: 1-7. DOI:
identifying patterns within the large sums of data, data 10.1109/NAFIPS.2010.5548298
mining can and should, be used to gain more insight Chaudhuri, B.B. and U. Bhattacharya, 2000. Efficient
into the faults, generate knowledge that can potentially training and improved performance of multilayer
fuel lead to further research in many areas of perceptron in pattern classification.
manufacturing. The high degree of diagnostic accuracy Neurocomputing, 34: 11-27. DOI: 10.1016/S0925-
of the models evaluated here is just one example of the 2312(00)00305-2
value of data mining. Dong, L., D. Xiao, Y. Liang and Y. Liu, 2008. Rough
set and fuzzy wavelet neural network integrated
CONCLUSION with least square weighted fusion algorithm based
fault diagnosis research for power transformers.
The objective of this study was to demonstrate the Elec. Power Syst. Res., 78: 129-136. DOI:
application of classification techniques in the problem 10.1016/J.EPSR.2006.12.013
of steel plates fault diagnosis and to describe the Eslamloueyan, R., 2011. Designing a hierarchical
strengths and weaknesses of the methods described. neural network based on fuzzy clustering for fault
Three different classification models have been diagnosis of the Tennessee–Eastman process.
evaluated. These models are derived from different Applied Soft Comput., 11: 1407-1415. DOI:
family namely; Multi-Layer Perceptron Neural 10.1016/J.ASOC.2010.04.012
Network (MLPNN), C5.0 Decision Tree (DT) and Faiz, J. and M. Ojaghi, 2009. Different indexes for
Logistic Regression (LR). These models are optimized eccentricity faults diagnosis in three-phase squirrel-
using different methods. Pruning method was used to cage induction motors: A review. Mechatronics, 19:
find the optimal structure of MLPNN model. The C5.0 2-13. DOI: 10.1016/j.mechatronics.2008.07.004
DT has been built using boosting algorithm with 10 Frank, A. and A. Asuncion, 2010. UCI machine
trails. Finally, the LR model was constructed using the learning repository Irvine. University of California.
stepwise forward method to gradually build the system.
Haykin, S.S., 1994. Neural Networks: A
The performance of these models were investigated
using known sets of steel plates proven faults features Comprehensive Foundation. 1st Edn., Macmillan,
obtained from the University of California at Irvine New York, ISBN-10: 0023527617, pp: 696.
(UCI) machine learning repository. Experimental Haykin, S.S., 2009. Neural Networks and Learning
results have shown that the C5.0 decision tree gives Machines. 3rd Edition, Prentice Hall, New York,
better performance than the other models. Furthermore ISBN-10: 0131471392 pp: 906.
the boosting algorithm enhances the performance of Hung, C.P. and M.H. Wang, 2004. Diagnosis of
C5.0 DT algorithm. Although data mining techniques incipient faults in power transformers using CMAC
are capable of extracting patterns and relationships neural network approach. Elec. Power Syst. Res.,
hidden deep into large datasets, without the cooperation 71: 235-244. DOI: 10.1016/j.epsr.2004.01.019
and feedback from the experts and professional, their Lau, C.K., Y.S. Heng, M.A. Hussain and M.I.M. Nor,
results are useless. The patterns found via data mining 2010. Fault diagnosis of the polypropylene
techniques should be evaluated by professionals who
production process (UNIPOL PP) using ANFIS.
have years of experience in Predicting steel plates faults.
ISA Trans., 49: 559-566. DOI:
REFERENCES 10.1016/j.isatra.2010.06.007
Leger, R.P., W.J. Garland and W.F.S. Poehlman, 1998.
Berry, M.J.A. and G. Linoff, 1997. Data Mining Fault detection and diagnosis using statistical
Techniques: for Marketing, Sales and Customer control charts and artificial neural networks. Artif.
Support. 1st Edn., Wiley, New York, ISBN-10: Int. Eng., 12: 35-47. DOI: 10.1016/S0954-
0471179809 pp: 454. 1810(96)00039-8
513
J. Computer Sci., 8 (4): 506-514, 2012
Lo, C.H., Y.K. Wong, A.B. Rad and K.M. Chow, 2002. Seng-Yi, S.D.L. and W.L. Chang, 2011. Application of
Fusion of qualitative bond graph and genetic multiclass support vector machines for fault
algorithms: A fault diagnosis application. ISA diagnosis of field air defense gun. Expert Syst.
Trans., 41: 445-456. 10.1016/S0019- Appli., 38: 6007-6013. DOI:
0578(07)60101-3 10.1016/j.eswa.2010.11.020
Loh, W.Y. and Y.S. Shih, 1997. Split selection methods Simani, S. and C. Fantuzzi, 2000. Fault diagnosis in
for classification trees. Stat. Sinica, 7: 815-840. power plant using neural networks. Inform. Sci.,
Ma, J. and J. Jiang, 2011. Applications of fault 127: 125-136. DOI: 10.1016/S0020-
detection and diagnosis methods in nuclear power 0255(00)00034-7
plants: A review. Prog Nucl. Energy, 53: 255-266. Thimm, G. and E. Fiesler 1996. Neural network
DOI: 10.1016/j.pnucene.2010.12.001 pruning and pruning parameters. Proceedings of
Maalouf, M., 2011. Logistic regression in data analysis: the 1st Workshop on Soft Computing, (WSC’ 96),
An overview. Int. J. Data Anal. Tech. Strategies, 3: Nagoya, Japan, pp: 1-2.
281-299. DOI 10.1504/IJDATS.2011.041335 Venkatasubramanian, V., R. Rengaswamy, K. Yin and
Maurya, M.R., R. Rengaswamy and V. S.N. Kavuri, 2003b. A review of process fault
venkatasubramanian, 2007. Fault diagnosis using detection and diagnosis: Part I: Quantitative model-
dynamic trend analysis: A review and recent based methods. Comp. Chem. Eng., 27: 293-311.
developments. Eng. Appli. Artif. Intell., 20: 133- DOI: 10.1016/S0098-1354(02)00160-6
146. DOI: 10.1016/j.engappai.2006.06.020 Venkatasubramanian, V., R. Rengaswamy, S.N. Kavuri
Nisbet, R., J. Elder and G. Miner, 2009. Handbook of and K. Yin, 2003a. A Review of process fault
Statistical Analysis and Data Mining Applications. detection and diagnosis: Part III: Process history
1st Edn., Academic Press, Burlington, ISBN-10: based methods. Original Res. Article Comp.
9786612168314 pp: 824. Chem. Eng., 27: 327-346. DOI: 10.1016/S0098-
Principe, J.C., N.R. Eulianoand and W.C. Lefebvre, 1354(02)00162-X
2000. Neural and Adaptive Systems: Fundamentals Zhang, Y. and J. Jiang, 2008. Bibliographical review
Through Simulations. Wiley, New York, ISBN-10: on reconfigurable fault-tolerant control systems.
0471351679 pp: 656. Annual Rev. Control, 32: 229-252. DOI:
Quinlan J.R., 1993. C4.5: Programs for Machine 10.1016/J.ARCONTROL.2008.03.008
Learning. Morgan Kaufmann, San Francisco,
ISBN-10: 1558602380 pp: 302.
Quinlan, J.R., 1986. Induction of decision trees.
Machine Learn., 1: 81-106.
DOI: 10.1007/BF00116251
514