Académique Documents
Professionnel Documents
Culture Documents
Zaki Brahmi
I.
INTRODUCTION
most areas were touched by; so, thus the importance of dataintensive computing has been increasing, and becomes the
foremost research eld in industry and academic communities.
Correspondingly, applications based on data-intensive services
have become the most challenging type of applications do
to the impacts of these enormous sources of data on the
data-intensive service compositions cost and response time.
Yet, the data-intensive service composition has many challenges. First, the large number of data sets and the increasing
functionality equivalent services make the composition more
complex. Second, the size and the number of distributed
data sets make the communication and storage response time
increase and thereafter the costs increase, which affect the
performance of the whole composition process. Third, the cost
of transferring data to and from service endpoints increases as
the number of data sets increase. Finally, the dynamic nature
of cloud computing and data replication needs a dynamic
and adaptive mechanism to regulate the interaction between
users and providers [9]. Few approaches have been proposed
in the literature to resolve the problem of optimizing cost
and response time for data-intensive services composition [7]
[8] [9]. These approaches are based on ant colony algorithm
for the selection of the optimal service and its data replicas.
However, most proposed approaches consider the search of
optimal services cost and response time and its data replicas
as two separated sub-problems. Additionally, these approaches
consider the composition of data-intensive services already
predened and just focus on the selection of the services that
give the optimal cost and response time (data-intensive service
selection problem).
This paper addresses the data-intensive services composition problem which gathers the data-intensive services composition and services selection problem, as well as the selection
of their optimal data replicas. In this work, we propose a
novel approach based on the articial bee colony algorithm
(ABC Algorithm). Our proposed approach is based on two
main phases:
i) The pre-composition phase involves two steps:a data movement strategy will be applied for all the data sets by which
we will be able to reduce the transfer frequency between
the services that requiers these datasets; then a web services
instances classication in the cloud will be set in order
to organize the search areas before the selection phase and
ensures an optimal time to answer the clients request.
ii) The composition phase: at this phase there will be used the
articial bee colony algorithm for the selection and composition of the data-intensive services as well as their data replicas,
since this algorithm showed its performance in mathematics
and computer problems compared with other PSO Praticle
Swarm algorithms [2].
The remainder of this paper is organized as follow, in
Section 2; we make a study on the related work as well as their
potential limitations. We focus in Section 3 on the basic used
concepts and data-intensive web service composition problem
and its formulation. We describe our proposed approach in
Section 4. Section 5 shows the evaluation of our approach.
Finally, we conclude our work and we refer to our prospects.
II.
R ELATED W ORK
size (dt)
bw (ddt , y)
(1)
dt DT i
(2)
(3)
Within T rp (ASi ) is the response time of the concrete dataintensive service. For each data-intensive service CSi , the data
access cost is given by ac(dt). The cost for the service, Cost
(CSi ), can be described by (4):
Cost (csi ) = Cvi (csi ) + Ctr (csi ) + Csr (csi )
(4)
Where Cvi (csi ) is the access cost of all data sets required by:
Cvi (csi ) =
dtDT i
ac(dt)
and Ctr (csi ) is the transfer cost of all data sets as follow:
Ctr (csi ) =
dtDT i
within tcost is the cost per unit of transferring data for a link
and Csr (CSi ) is the cost of the concrete service CSi .
D. Articial bee colony algorithm
The Apis mellifera (Bee) is one of the fascinating social
insects which lives together as a family. All this family
members are engaged with complex services. Each bee in
a colony has an individual and collective (social) behavior
which is very useful for communication, construction and
responsibility. [11] Tereshko and Loengarov developed
a minimal model of forage selection that leads to the
emergence of collective intelligence which consists of three
main components: food sources, employed foragers, and
unemployed foragers, and denes two modes of the behavior:
recruitment to a nectar source and abandonment of a source.
Teodorovic suggested to use bee swarm intelligence in the
development of articial systems aimed at solving complex
problems in trafc and transportation [10]. In ABC algorithm,
the position of a food source represents a possible solution
to the optimization problem and the nectar amount of a food
source corresponds to the quality (tness) of the associated
solution. The number of the employed bees or the onlooker
bees is equal to the number of solutions in the population. At
the rst step, the ABC generates a randomly distributed initial
population (C = 0) of SN solutions (food source positions),
where SN denotes the size of employed bees or onlooker
bees. Each solution xi (1 iSN) is a Ddimensional vector.
Here, D is the number of optimization parameters. After
initialization, the population of the positions (solutions) is
subject to repeated cycles, where (1 C MCN) , of the
search processes of the employed bees, the onlooker bees and
the scout bees. An employed bee produces a modication on
the position (solution) in her memory depending on the local
information (visual information) and tests the nectar amount
(tness value) of the new source (new solution). If the nectar
amount of the new one is higher than the previous one, the
bee memorizes the new position and forgets the old one.
Otherwise, she keeps the the previous position in her memory.
Then all employed bees complete the search process, they
share the nectar information of the food sources and their
informations positions with the onlooker bees. An onlooker
bee evaluates the nectar information taken from the employed
bees and chooses a food source with a probability related to
its nectar amount. As in the case of the employed bee, the
onlooker bee produces herself a modication on the position
in her memory and checks the nectar amount of the candidate
source. If the nectars probability is higher than the previous
one, the onlooker bee memorizes the new position and forgets
the old one. The main steps of the algorithm are as follow:
1: Initialize Population
2: repeat
3: Place the employed bees on their food sources
4: Place the onlooker bees on the food sources depending on
their nectar amounts
5: Send the scouts to the search area to discover new food
sources
III.
Fig. 1.
(5)
its own service CSij ,in return the bee k make a cloning of
several new bees equal to the number of data replica in each
data sets. Subsequently, the cloned bees leave the service
endpoint and begin an exploitation process to their appropriate
data replica (new food sources) and calculate their utility
functions as follow:
v
Cost dtvq + Responstime dtvq
(6)
U dtq =
2
Where:
Cost(dtvq )= Cost of the data replica dtvq of the data set dtv
Responsetime(dtvq )= Response time of the concrete service
dtv Then the cloned bees return to the endpoint service CSij to
share these new numbers with the onlooker bees in the virtual
hive (service CSij ) which will decide which is the preferred
data replica based on their utility function and calculate their
tness functions and their probability. Then, they compare
their performance in a way that the data replica which has
the highest probability value will be the selected one (the best
data replica). The selection of each data replica of a given data
set will be modeled by the variable x, i.e. each variable x of
each data replica is initially set to False; once the data replica
is chosen by the onlooker bees, her variable x will assign the
value True. Once the selection is done every cloned bee will
die and only the best bee k stores the value of the best data
replica set as well as the tness function of the main web
service CSij . At the end, the bee k returns to the main hive
to share all the information with the Onlooker bees that are
waiting for her return in the dance area of the main hive(start
point).
iii.The selection phase: In this phase, the employed bees arrive
at the dance area where they nd the onlookers. The employed
bees start doing different dance movements to show the
regions of each food source that are brought. Consequently, the
onlooker bees do many services: rst, they pick regions of web
services shared by the employed bees, second they determine
the amount of nectar and store food in their submissions.
Thereafter, they will decide which web services are the most
appropriate is chosen by the use of this formula:
f it
pi = SN i
n=1 f itn
Fig. 2. ABC Algorithm for selection of data-intensive web service and its
data replica
(7)
Fig. 3. The effect of the number of services per class on the running time
for nding the optimal composition
Fig. 4. The effect of the number of data set per service on the running time
for nding the optimal composition.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
V.
[9]
[10]
[11]