Vous êtes sur la page 1sur 9

A Comparative Analysis Study of Data

Mining Algorithm in Customer


Relationship Management

Yang Liu

Abstract With the rapid development of database technology and the wide
application of database management system, an increasing number of data are
accumulated in the database or data warehouse of enterprises. As the most
important resource of enterprises, customers leave a great deal of information in
the operation of enterprises. The entrepreneurs manage the customer relationship
and analyze customers materials by means of informatization in order to verify
the customers needs more accurately and offer correct decision-making schemes
for the enterprises further marketing and development. This thesis analyzes the
classication algorithm, the most representative algorithm in data mining algo-
rithms at present, which includes decision tree algorithm, Bayesian classication,
neural network classication algorithm, rough set classication algorithm. It
summarizes the property and characteristic of each algorithm by analyzing and
comparing the theory of the present representative classication algorithms in the
following aspects: the accuracy rate (the correct, predictive and unseen class label
capacity of data of classication models), the speed (the computational costs of
producing and using models, i.e., time complexity), the robustness (the models
ability to predict correctly for the given data with noise and missing value) and
scalability (the valid ability of congurable model for the given data set with
different numbers).

Keywords Customer relationship management Data mining Information


technology

Y. Liu ()
Inner Mongolia Normal University, Hohhot, China
e-mail: ciecliuy@imnu.edu.cn

Springer Science+Business Media Singapore 2017 217


X. Li and X. Xu (eds.), Proceedings of the Fourth International Forum
on Decision Sciences, Uncertainty and Operations Research,
DOI 10.1007/978-981-10-2920-2_19
218 Y. Liu

1 The Denition and Connotation of Customer


Relationship Management

1.1 The Denition of Customer Relationship Management

There is no unied denition to CRM in academic circles. It varies in terms of


different starting points, which is showed below:
(1) A set of integrated approach.
Emma Chablo dened it from the angle of system integration. He maintains
that CRM is a set of integrated approach that integrates all the elds that have
contact with customers in the operation, like marketing, sales, services for the
customers and with function supports by the valid combination of people,
processes and technology.
(2) To satisfy the customers with various methods.
Claudia Imhoff et al. also dened it from the angle of system integration. They
argue that CRM aims to coordinate the relationship of the rms strategies,
organization structure, culture, customers information and technology, thus
managing the contact of customers validly and realizing customers long-term
satisfaction, thereupon making stable prots for enterprises.
(3) CRM is a kind of approach and thought.
Graham considers that CRM is a kind of attitude, inclination and value by
which enterprises deal with the operational business and customers
relationship.
(4) CRM is a technology.
Statistical Analysis System, a famous statistical software and a developer for
CRM scheme platform. The rm dened CRM from the angle of technology.
CRM is a process of technology by which enterprises master and make use of
customers information to a maximal extent, thereupon cultivating and
strengthening customers loyalty and realize the lifelong attraction for
customers.
From my point of view, Yang Yongheng and some other peoples denition to
CRM in The Connotation, Triggering Factors and Growth Dimension of CRM is
more suitable for the statement in contemporary enterprises and academia:
CRM is one of the overall strategies of enterprises. By adopting advanced data
technology and other information technology, it gets customers data, analyzes
customers characteristics of demand and preference of behaviors, develops the
management to customers and cultivates customers long-term loyalty, thus striking
the balance between the maximization of customer value and the maximization of
enterprises prots.
A Comparative Analysis Study of Data Mining Algorithm 219

1.2 The Connotation of CRM

No matter how the manners of execution of CRM change, its goal remains the
same: to serve better connotation for executing CRM. CRM is nothing else but
three points in conclusion: the maximization of customer value, the maximization
of the relationship value, the information system, techniques and technology of
CRM.
(1) The Maximization of Customer Value
Customer value is customers preference of perception and evaluation to the
attribute, attribute efciency and the feedback (the promotion or hindering to
the goals and original intention) of the products. As is shown in the chart
(Fig. 1).
Its universally acknowledged that the product service can create value for
customers if the benets that customers get from certain product or service
exceed the cost they have paid. Customers will choose the enterprise that can
create greater customer value for them when they purchase products or ser-
vices. Therefore, it is an important foundation to maintain a better relationship
with customers that enterprises can create greater customer value for
customers.
(2) The Maximization of Customer Equity
Rust and Zeithamal et al. argue that the customer equity of an enterprise is the
total of the discounted lifelong value of all the customers. That is to say, for
enterprises, the customer equity for every customer is his discounting lifelong
value, i.e., the discounting value that customers bring to enterprises in their
whole life cycle of relationship.
Enterprises provide maximized customer value for customers, thus maintain-
ing their relationship with customers in order to make the maximized prots
from customers. Therefore, enterprises also need to carefully measure the
value that customers bring in the whole life cycle of relationship, i.e., the
calculation of customers lifelong value.

Fig. 1 Hierarchical modal of customer value


220 Y. Liu

(3) Information Technology and Data Mining Algorithm


In the 1980s and 1990s, most rms win the competitive edge in the process of
decision-making by adopting business intelligence tools, such as Sales Force
Automation (SFA), Customer Service System (CSS), the call center inte-
grating marketing and service and On-Line Analytical Processing (OLAP).
However, it brings exponential growth of the amount of information for the
development of processing techniques and storage capacity of computers,
thereupon, new methods and means of CRM will come into being. Especially
to save the cost and nd out the valuable customers accurately, Data Mining
Algorithm is further used to the system of CRM and is gradually becoming the
hard core of CRM.
CRM integrates customers information and resource by recombining the
business process of enterprises with the help of advanced information tech-
nology, managing idea and sharing customers information and resource
inside enterprises in order to provide the individual service of one-to-one for
customers. It improves customer value, satisfaction, protability and customer
loyalty, and keep and attract more customers, thus eventually realizing the
maximization of enterprises prots.

2 The Classication Algorithms in Data Mining

Data Mining is a process to extract the useful information and knowledge from the
plentiful, incomplete, noisy, blurry and random data. These information and
knowledge hide in the data and are unknown to people but potential.
The application of computers makes it possible for contemporary enterprises to
gather a large number of data of customers. Data Mining is to deal with a mass of
data by utilizing the subject technology in several aspects like database, statistics,
articial intelligence and machine learning. Data Mining will play an increasingly
signicant role in the futures management of enterprises.

2.1 Decision Tree Classication

Decision Tree is a structure used to generate the rules for classication. As is shown
in Fig. 2, the internal nodes in the tree stands for a test to the attribute. Every leaf
represents the category or the categorys distribution. Every branch in the tree
stands for an output of the test, i.e. a rule. The node in the topmost of the tree is
called root node.
Every internal node shows a test to the attribute and every leaf node stands for a
category. The decision tree for computer customers, it points out whether the
customers will buy computers or not.
A Comparative Analysis Study of Data Mining Algorithm 221

Fig. 2 The decision tree for


computer customers

At rst, all the characteristics of training dataset need to be tested and the root
node in decision tree is set up by the characteristics that have the maximization of
information gain. The branches are set up by the various numbers of this charac-
teristics, and living examples in each branch are conducted set recursion. The nodes
and branches in the tree are set up in this way until all the data in certain subset
belong to one category or no other characteristics can be used to partition data
again.
The attribute that has the maximum of information gain is supposed to be chosen
as the test attribute of the present node and the data are segmented according to the
numbers of this attribute value. Every segment is demanded the maximization of its
difference among the divided groups. The process of segmentation is also called
the purication of data.

2.2 Bayesian Classication

Bayesian Classication is a kind of statistic classication, which is based on


Bayesian theorem of posterior probability. It can predict the possibility of class
members relationship. For example, it can predict the probability that the given
sample belongs to a specic category. The Bayesian formula in statistics can be
showed as follows:

PBPAjB
PBjA = 1
PA

In the above formula, P(A) and P(B) stand for the probability of occurrence of
event A and event B respectively; PBjA represents event As occurrence proba-
bility that is triggered by event B in the condition that the known event A has
occurred; PAjB is the possible occurring probability of event A in the condition
that event B has occurred. For example, suppose A represents the event
222 Y. Liu

customers will buy computers, B stands for the event customers earnings are
relatively high, so PBjA represents the probability of earnings of customers who
have purchased computers are relatively high while PAjB is the probability of
customers who have high earnings will purchase computers.
Bayesian classication is the approach of getting posterior probability by the
known prior probability and conditional probability. It can conrm the probability
of the event that a given sample belongs to a specic category.

2.3 Rough Set Classication Algorithm

Rough set theory is based on the establishment of equivalence class inside given
training data. All the data samples that form equivalence class are indiscriminate,
that is to say, these samples are equivalent for the attribute of descriptive data.
Generally, some categories cant be distinguished by the available attributes in the
given data of present world. Rough set can be used to approximately or roughly
dene this category. The rough set denition of the given category C is approxi-
mated by two sets: the lower approximation and upper approximation of C. The
lower approximation of C is made up by these data samples which undoubtedly
belong to C on the basis of knowledge about attribute. The upper approximation of
C is made up by these data samples which may be considered that they dont belong
to C according to knowledge about attribute.
The rough procedures for the classication of rough set:
(1) Inspect columns of the conditional attribute in the list of information one by
one. If a conflicting record exists after removing this column, reserve the
original attribute value in the conflicting record. If a conflicting record doesnt
come into being but duplicate record does, mark the attribute value of the
duplicate record with one symbol and mark the attribute value of the other
records with the second symbol.
(2) Delete the potential duplicate records and inspect every record that has been
marked by the second symbol. If the decision can be judged only by the
unmarked attribute value, mark the second symbol with the rst symbol, or
else, revise them as the original ones; If all the conditional attributes of a
record are marked, revise the attribute items marked with the second symbol as
the original attribute value.
(3) Delete all the records whose conditional attributes were marked with the rst
symbol, and the potential duplicate records.
(4) If only one conditional attribute varies in two records and the attribute of one
of them is marked with the rst symbol, delete the other in the condition that
the decision of the original record can be judged by the unmarked attribute
value.
A Comparative Analysis Study of Data Mining Algorithm 223

Rough set can also be used in feature reduction (recognize and delete the
attributes that are not helpful to the given training data categories) and relevant
analysis (evaluate the attribution and signicance of every attribute on the basis of
classication task).

2.4 BP Neural Network Classication Algorithm

Neural network is based on the mathematical model with self-study ability. It can be
used to analyze a mass of complicated data and complete schema extraction and
trend analysis which are extremely complex for the human brain or other com-
puters. Its typical application is to set up classication models.
Neural network regards every link as a processing element (PE) and tries to
simulate the function of neurons in human brain. Neural network learns from the
experience, which is universally used to excavate the unknown connection between
a combination of inputting statistics and an outcome. Like other approaches, neural
network tests the existing methods in data at rst, and then generalizes from the
relationship found in the data, nally draws a predicting conclusion. Neural net-
work is attracted special attention due to its ability to predict complicated processes.
A processing element adopts a series of mathematical functions to process data
by the way of collection and conversion. The function of a single processing
element is limited, but the system formed by the link of several processing elements
can set up an intelligent model. Processing elements can be interconnected in many
various methods and they can be repeatedly trained several times, hundreds of times
or even thousands of times in order to t the data that need to be set up a model
more accurately.
A processing element needs to be connected with the input unit or output unit. In
the process of network training, the connection weight (also can be called weight)
between the input unit and output unit needs to be modied. The increasing and
decreasing of one connection weight proceeds according to the importance of the
outcome it generates. The connection weight depends on the weight that is put on it
in the repeated training process. The weight is adjusted by the mathematical method
called learning rule in the process of training.
Neural network is repeatedly trained according to the data of history samples. In
the training process, the data are connected and converted by the processing ele-
ments and the link between them is given various kinds of weight. That is to say, a
network needs to try different schemes to predict the outcome variable of every
sample.
The training of network stops when the outcome result is identical with the
known result in the specied accuracy or is coincident with other nish criteria.
The most prominent advantage of neural network is its accurate prediction to the
complicated problems. R. Apsotolo-Paul detailedly expounds the fact that neural
network has been generally and deeply applied in nancial institutions in his book
The Neural Network in Capital Market. Nowadays, neural network has been used in
224 Y. Liu

many applications. IBM, SAS, SPSS, HNC, Angoss, RightPoint, Thinking


Machines and NeoVista are some suppliers of neural network products. Falcon, the
neural network product of HNC, is used to recognize the fraud in nancial market.
A considerable part of credits of Americans has been analyzed by Falcon. The
accuracy of neural network is relatively highly compared with other methods in the
test.

3 The Comparison of Classication Methods

3.1 The Assessment Criteria of Classication Methods

(1) The accuracy rate


The correct, predictive and unseen class label capacity of data of classication
models.
(2) The speed
The computational costs of producing and using models, i.e. time complexity.
(3) The robustness
The models ability to predict correctly for the given data with noise and
missing value.
(4) Scalability
The valid ability of congurable model for the given data set with different
size.

3.2 The Comparison of Classication Methods

From the angle of accuracy, neural network has the ability of approaching arbitrary
nonlinear function, the ability of self-study and self-adaption, and also the ability of
more inputting and outputting. While the most appropriate attribute for decision tree
is the qualitative value.
From the aspect of calculating speed, Bayesian classication has the quickest
speed while decision tree is in the next place.
In terms of the characteristic of robustness, neural network can deal with noise
data while the rough set cant do it well. Some modied algorithms of decision tree
classication can deal with the null value and noise value, and Bayesian classi-
cation also has good robustness.
Considering the feature of scalability, neural network is very sensitive to the
input quantity of data. The more attributes neural network has, the longer time the
Internet is trained. The decision tree needs certain pruning method to insure the
accuracy of data models. While rough set classication can effectively reduce
redundant attributes according to its own characteristics.
A Comparative Analysis Study of Data Mining Algorithm 225

According to the above-mentioned comparisons, neural network has the ability


of dealing with nonlinear function and the ability of self-study and self-adaption.
But the data model of neural network may be input a mass of parametric variables,
which is bound to influence the training speed and accuracy of neural network.
However, decision tree classication chooses the attribute which has the highest
information gain as the testing attribute of the current node and segments data
according to the numbers of the attribute value. Every segmentation is demanded to
have the most obvious difference of its divided groups. Rough set can simplify
the attributes and attribute value in the decision tables and validly delete the
redundant attributes in the condition that the useful information is invariant.
Accordingly, if we analyze some problems with neural network, we can select
the collected data with rough set classication at rst and exclude the redundant
attributes which are ineffective or less effective to the classication results in the
analyzing process in order to simplify the structure of network and reduce the
training time, which can make the neural network work more effectively and
classify data more accurately.

Acknowledgements This work was supported by Study on the employment information of the
West University Students Based on the data mining technology and Application of Markov chain
model in LAMS in the realization of teaching evaluation system.

References

1. Chablo E (2000) The importance of marketing data intelligence in delivering successful


CRM, DM Rev 4657
2. Data mining technology (2004). http://www.cs.sdu.edu.cn/info/static_news. Accessed 20 May
2004
3. Heckerman David (1997) Bayesian networks for data mining. Data Min Knowl Disc
1:79119
4. Imhoff C, Loftis L, Geiger JG (2001) Building the customer-centric enterprisedata
warehousing techniques for supporting customer relationship management. John Wiley
&Sons, Inc. New York, p 26
5. Ling R, Yen DC (2001) Customer relationship management: an analysis framework and
implementation strategies. J Comput Inf Syst 8297
6. Quinlan JR (1986) Induction of decision trees. Mach Learn
7. Roberts-Phelps G (2001) Customer relationship management: how to turn a good business
into a great one. Hawskmere, London, p 158
8. The website of SAS Institute Inc. http://www.sas.com
9. Yang YH (2012) The connotation, driving factors and growth dimensions of customer
relationship management. Nankai Bus Rev 2
10. Zhang WY (2002) Research on classication association rules mining based on rough set
theory. J Xian Petrol Inst (Natural Science edition)

Vous aimerez peut-être aussi