Vous êtes sur la page 1sur 8

2013 IEEE 20th International Conference on Web Services

Selecting Top-k Composite Web Services using


Preference-aware Dominance Relationship
Shaoqian Zhang, Wanchun Dou

Jinjun Chen

State Key Laboratory for Novel Software Technology


Nanjing University
Nanjing, 210046, P.R.China
Email: zsq0204@gmail.com, douwc@nju.edu.cn

School of Engineering and Information Technology


University of Technology, Sydney
Sydney, Australia
Email: jinjun.chen@gmail.com

skyline [10]). With dominance relationship between services,


for a task, only skyline services are selected for further
composition, reducing the number of candidate services.
Given a set of d-dimensional points, a point p is said to
dominate another point q if it is better than or equal to q
in all dimensions and better than q in at least one, denoted
as p  q. The skyline points are the subset of points that
are not dominated by other points [10]. In a scenario of
composing services with skyline technique, given a predened
task schema, skyline services are selected for each task and
composed into a composite service. Here is an example of
composing services using skyline in Fig. 1.

AbstractWeb service composition lets users create valueadded composite Web services on existent services, where topk composite services are helpful for users to nd a satisfying
composite service efciently. However, with an increasing number
of Web services and users various composition preferences, computing top-k composite services dynamically for different users
is difcult. In view of this challenge, a top-k composite services
selection method is proposed, based on a preference-aware service
dominance relationship. Concretely speaking, rstly, user preferences are modeled with the preference-aware service dominance,
and then, in local service selection, a multi-index based algorithm
is proposed, named Multi-Index, for computing candidate services
of each task dynamically. Then, in global optimization, combined
with a service lattice, top-k composite services are selected under
a dominant number-aware service ranking. At last, an experiment
is presented to verify our method.
Keywords-Web service composition; top-k; preference-aware;
service dominance relationship

Example 1 Consider the example shown in Fig. 1. Here, t1 ,


t2 and t3 denote three tasks in a task schema. There are
three candidate services for each task and following tuples
represent their QoS values which comprise three dimensions
respectively. The tuples in red rectangles are skyline services.

I. I NTRODUCTION
A. Current Status of Related Research
Web services constitute a distributed computing infrastructure made up of many different interacting application modules
trying to communicate over private or public networks to virtually form a single logical system [1]. As reusable components,
Web services are meant to be combined to meet business
needs for enterprise applications [2]. With the growing number
of Web services, a signicant number of functionally similar services are provided, making service composition more
challenging, which demands efcient and personalized service
selection techniques for service requesters.
A number of service selection methods have been developed, aiming to achieve a composite service with the best
user desired Quality of Service (QoS), which could be divided
into two categories. On one hand, in some works [3], [4], [5],
Web services are selected depending on a predened objective
function. In this case, a weighting mechanism is leveraged
where users express preferences over different quality parameters as numeric weights. The composite service gaining the
highest value is selected and returned to the user. As its a
rather challenging task for users to transform their preferences
into numeric weights, some researchers have tried solving the
problem using the dominance relationship between services in
[6], [7], [8] (also called skyline [8], full skyline [9] or free
978-0-7695-5025-1/13 $26.00 2013 IEEE
DOI 10.1109/ICWS.2013.20


 

   



 
  

  


  
 




Fig. 1 An example of skyline-based service composition


In this example, through skyline (e.g., s11  s13 , s22 
s21 ), we can reduce the number of candidate services. Nevertheless, with the increasing number of Web services and the
curse of dimensionality [9], full skyline is not practically
useful [8], [9]. Moreover, users may have various preferences
which may also constantly change over time. In view of these
challenges, a top-k composite services selection method is
proposed in this paper, combined with a preference-aware
service dominance, a multi-index based skyline computation
algorithm and a dominant number-aware service ranking for
global optimization.
B. Motivation
In order to illustrate the motivation of our method, a
real service composition scenario in Small and Medium-sized
Enterprises (SMEs) is presented.
75

Scenario Example Consider the scenario in Fig. 2. A manufacture receives an order to deliver some merchandise to a
retailer. He plans to satisfy the order from his own inventory
and request parts from a supplier. Thus, he needs to select
a satisfying supplier from numerous suppliers and two transportation services for merchandise carriage.








proposed method, in Section IV, an experiment is conducted


on a public dataset as well as a synthetically generated dataset.
The performance of our method is analyzed in Section V. At
last, in Section VI, the conclusions are presented.
II. U SER P REFERENCE - AWARE S ERVICE D OMINANCE
R ELATIONSHIP
Given a task schema T = {t1 , t2 , . . . , tm }, for a task
ti , the candidate service class for ti is denoted as Si
where all services could perform a same functionality, but
they differ in QoS values. Given a d-dimensional QoS space Q = {q1 , q2 , . . . , qd }, a set of Web services W S =
{ws1 , ws2 , . . . , wsn } is said to be a service set on Q if
qi Q, wsi W S, wsi .qi . For each dimension qi , we
assume that there exists a total order relationship, denoted
as i on its domain values. Here, i can be < or >
relationship according to a users preference. For simplicity,
in this paper, only preferences on non-functional requirements
(QoS) are considered. Without loss of generality, we assume
that each i represents > in the rest of this paper, and for
each QoS dimension, the increase of its value benets service
users, which indicates that services with higher QoS values
are preferred.
In a service selection process, given a task schema, a user
express preferences on some QoS attributes. For example, in
the example scenario, suppose that the manufacture pays more
attention to production cost and production delay, and then,
the two attributes will be the evaluation criteria during supplier
selection. In this paper, the preference space of the user ui
is modelled as Pui = {p1 , p2 , . . . , pl }, where pi Q and
pi denotes a QoS criterion affecting ui s decision. We can
indicate that P Q. Moreover, for simplicity, we assume
that QoS criteria not in preference space are not important for
the service selection process.

 

  
 

Fig. 2 A scenario of service selection in a SME


In this scenario, the manufacture needs to select a supplier
and transportation services that meet the requirements to satisfy the order. Traditionally, facing a large number of candidate
services, full skyline and mathematical programming methods
are usually employed to select the best one for users. However,
two major limitations are along with these approaches. First,
it is a rather challenging task for users to transform their
preferences into numeric weights [7]. Second, as the number of
dimensions increases, the number of skyline services increases
exponentially, which is known as the curse of dimensionality
[9]. In reality, in order to earn more prots, a product or
service is usually assigned some unique features superior to
others. Millions of products and services that provide similar
functions make the curse of dimensionality more common.
To address this problem, a selection method for top-k
composite services is proposed in this paper. Our work can
be summarized as follows.

Denition 1. Service Dominance


A service wsi is said to dominate another service wsj on Q if
and only if qi Q, wsi .qi wsj .qi and qt Q, wsi .qt >
wsj .qt .

1) We address the problem of top-k composite services


selection, dening a user preference-aware service dominance to select candidate services for each task.
2) We propose a Multi-Index algorithm to compute skyline services dynamically for local service selection
using preference-aware service dominance, and present
a dominant number-aware service ranking for global
optimization.
3) We perform an experiment on a public collection of
services with QoS information as well as a synthetically
generated scenario.

Denition 2. Service Skyline


A service wsi is a skyline service if and only if there does not
exist a service wsj = wsi dominating wsi . Service skyline
could also be called full skyline or free skyline.
Denition 3. User Preference-aware service Dominance
Given a user preference space P , a service wsi is said to
dominate another service wsj on P if and only if pj P ,
wsi .pj wsj .pj and pt P, wsi .pt > wsj .qt .
Denition 4. User Preference-aware service Skyline (UPS)
A service wsi is a skyline service on P if and only if there
does not exist a service wsj = wsi dominating wsi on P .
We use l-UPS(W S, P ) to denote the set of skyline services in
W S on P , where l represents the number of attributes in P .
l-UPS(W S, P ) can be abbreviated as l-UPS or UPS.

C. Organization of the Paper


The rest of this paper is organized as follows. In Section II,
a user preference-aware service dominance is presented, based
on which, a selection method for top-k composite services is
proposed in Section III. The method consists of two steps:
multi-index based local service selection and global optimization for top-k composite services. In order to verify the

The rst two denitions describe skyline services over


full QoS dimensions, and Denition 3, 4 show what a user

76

Corollary 1. |UPS(W S, P )| |f ull skyline on W S|.

preference-aware skyline service is. Given an l-dimensional


preference space of user ui and d-dimensional QoS space Q,
when l = d, the user preference-aware service dominance is
equivalent to the original dominance in full skyline. As the
number of dimensions needed to compute in UPS decreases, the number of returned services is also reduced, which
could be concluded from Theorem 1. Its worth to note that,
compared with the k-dominant skyline, UPS could avoid the
cyclic dominance relationship between services [10]. Using
UPS, top-k candidate services for each task are selected for
the further composition.
In order to rank services in a same candidate service class,
a score is computed for each service under a user preference
space P , denoted as scorei (ws) for the service
l ws. The score
is computed as follows: scorei (ws) =
j=1 ws.qj , where
qj (qj P ), 0 ws.qj 1 and ws.qj is the QoS value
after normalization. The generalized formula for computing
scorei (ws) is as follows:
scorei (ws) =

l

qjmax ws.qj
j=1

Lemma 1. Let C = {S1i1 , S2i2 , . . . , Smim } be a composite


service and topK.Sj be the top-k services of the class Sj . If
Sjij C and Sjij
/ topk.Sj , then C
/ topK.C.
Proof by contradiction can be employed to prove Theorem 1
(proof excluded due to lack of space). Through Theorem 1, we
can get Corollary 1, which is the basis of employing UPS for
our method. Corollary 1 tells us that top-k candidate services
for each task are sufcient to compute the top-k composite
services [6]. We will rst compute UPS for each task, select
top-k services according to the score of each service, and then
compute the top-k composite services. The two main problems
we want to solve are as follows:
Problem 1. Given a task schema T, a user preference space
P , and a candidate service class W Si for a task ti T (1
i m), compute the UPS(W Si , P ) for ti .
Problem 2. Given the UPS(W Si , P ) for each task ti T (1
i m), compute the top-k composite services for the task
schema T.
To solve the rst problem, in order to compute UPSs efciently, an index-based skyline computation is requisite, which
is also necessary for managing large number of services. Although previous index-based skyline computation algorithms
can be used to compute each of UPS requests individually, as
different indexes are needed for different preference spaces,
it is likely that the algorithms can not response multiple
requests simultaneously. Therefore, a Multi-Index algorithm
is proposed, in Section III-B, for computing UPSs with
different preferences simultaneously. For the second problem,
a dominant number-aware ranking for composite services is
presented, in Section III-C, combined with a service lattice
and a max-heap.

(1)

qjmax qjmin

where qjmax and qjmin are the maximum and minimum values
of this service class on the QoS dimension qj . If qjmax =
qjmin , scorei (ws) = 1. More details about normalization could
be referred to [11]. Meanwhile, for selecting top-k composite
services comprehensively, a dominant number is dened for
ranking composite services. For the service ws, its dominant
number is equivalent to the number of services dominated by
ws in UPS.
Example 2 Consider the 6-dimensional service set W S =
{ws1 , ws2 , . . . , ws5 } in TABLE I. For simplicity, all QoS
values are normalized. Among these services, all the services are full skyline services. Given a user preference space
Pui = {q2 , q4 , q5 }, UPS(W S, Pui ) includes ws1 , ws3 , ws5 ,
however, for another user preference space Puj = {q1 , q2 , q3 },
UPS(W S, Puj ) = {ws1 , ws4 , ws5 }.

III. S ELECTING T OP -k C OMPOSITE W EB S ERVICES


A. Overall Algorithm
In this section, we present a running scenario of our method
and show the overall algorithm. Fig. 3 shows a scenario of our
method, where users rst dene a task schema and express
preferences for a composition. And then, the system generates
a preference space and sends it to service brokers that locate
across different clouds and organizations. A service broker
provides functions like service registry, update, deletion and
other management functions. A broker selects services that
meet user preferences for a task, and returns them to the
system that is responsible for collecting candidate services of
all tasks and computing top-k composite services.
Our method consists of two main steps: 1) Local Service
Selection, and 2) Global Optimization for Composite Services.
In the rst step, each service broker is responsible for a task
to select top-k candidate services, ranking services according
to the score of a service. When all candidate services are
returned, a composite service lattice is constructed using a
max-heap to select top-k composite services for users, where
a dominant number-aware ranking is employed.

TABLE I
E XAMPLE OF W EB S ERVICE S ET
Service
ws1
ws2
ws3
ws4
ws5

q1
0.89
0.72
0.58
0.90
0.36

q2
0.71
0.31
0.66
0.52
0.76

q3
0.90
0.45
0.74
0.46
0.80

q4
0.57
0.22
0.92
0.87
0.28

q5
0.63
0.10
0.72
0.67
0.74

q6
0.54
0.93
0.43
0.33
0.25

In Example 2, for the same service set W S, the number


of full skyline services is larger than the number of services
in UPS and, for different user preferences, the UPSs are
also different. Therefore, in this paper, UPS is employed for
selecting top-k candidate services for each task, which is
proven to be sufcient to compute the top-k composite services
in Lemma 1 [6].
Theorem 1. A service wsi UPS(W S, P ), then wsi must be
a full skyline service on W S.

77

and, within each partition, services are indexed according to


the ws.qmax . Here, an example is presented to illustrate the
application of the algorithm for computing UPS.
Example 3 Consider the example shown in TABLE II. Suppose that all the services are in a same candidate service
class. We only show the criteria in preference space and the
partitions after transformation without the B + -tree structure.
































TABLE II
A N E XAMPLE OF Multi-Index FOR COMPUTING UPS

Fig. 3 A scenario of selecting top-k composite services


It is worth to note that, rst, in order to select high-quality
candidate services for each task, a score is computed for
a service. Second, among all candidate composite services,
top-k services are selected according to a dominant number.
It is believed that, good composite services should not only
have satisfying QoS values, but also have a stronger dominant
ability which is shown by a larger dominant number [9] .

criterion1

criterion2

criterion3

s3 (0.90, 0.73, 0.65)

s6 (0.81, 0.85, 0.34)

s13 (0.34, 0.27 ,0.95)


s1 (0.74, 0.68, 0.95)

s7 (0.90, 0.52, 0.71)

s12 (0.52, 0.80, 0.61)

s11 (0.85, 0.33, 0.14)

s9 (0.62, 0.75, 0.23)

s4 (0.88, 0.90, 0.95)

s8 (0.52, 0.43, 0.25)

s5 (0.70, 0.75, 0.54)

s15 (0.82, 0.75, 0.90)

s10 (0.50, 0.32, 0.16)

s14 (0.68, 0.73, 0.28)

s2 (0.34, 0.18, 0.72)

In Example 3, the preference space includes criterion1 ,


criterion2 and criterion3 , services are organized into three
partitions and, in each partition, services are sorted in an nonascending order of the maximum value in that dimension. For
example, in s3 (0.90, 0.73, 0.65), as the maximum value is 0.90
of criterion1 , it is classied into the criterion1 partition.
Some observations could be obtained from TABLE II. First,
in each partition, through examining the top services with the
largest values, skyline services could be obtained efciently.
For example, in criterion3 , it is clear that s4 (0.88, 0.90,
0.95) is a skyline service. Second, using the non-ascending
sorting technique, we can prune away some services without
examining them. For example, as the minimum QoS value
of s4 is 0.88, the services whose maximum values are smaller
than 0.88 could be eliminated easily. In other words, the larger
the minimum value is, the more services we can prune away.
Furthermore, as some candidate services could be eliminated
easily without examination, the algorithm only needs a small
amount of memory [12].

B. Multi-Index based Local Service Selection


This subsection provides a detailed description of local
service selection in a service broker. In order to realize dynamic computation of UPS for different users simultaneously,
a multi-index based algorithm is proposed, which improves
the Index algorithm in [12]. And then, according to Lemma 1,
top-k candidate services of each task are selected for global
optimization.
Index exploits a transformation mechanism mapping high
dimensional QoS values into single dimensional space, and
employs the B + -tree structure to index transformed services
[12]. Although Index could perform an efcient computation
on an indexed dataset, as the only index is pre-constructed
with all attributes, it could not respond to multiple requests
with different preferences simultaneously. To address this
problem, in Multi-Index, an index construction algorithm is
developed for dynamically generating indexes for different
preferences, where an index is only related to the attributes
in the preference space. With a new request, if the index of
its preference space is not built in system, Multi-Index could
dynamically generate an index. Using Multi-Index, the system
(brokers) could keep multiple indexes as well as construct new
ones for responding to new preferences.
The algorithm works as follows. Suppose that the QoS
values of a service ws is (ws.q1 , ws.q2 , . . . , ws.qd ), where
qj P , 0 ws.qj 1, 1 j d. The service is mapped
to (ws) over a single dimensional space using the following
formula:
(ws) = ws.dimmax + ws.qmax
(2)

Algorithm 1 Multi-Index Algorithm


Require:
a preference space P = {p1 , p2 , . . . , pl }, a service class W S =
{ws1 , ws2 , . . . , wsn } and its QoS values.
Ensure:
1: if index(P ) == f alse then
2:
generateIndex(P );
3: end if
4: maxi maxV alue(pi ), mini minV alue(pi );
5: ma maxli=1 maxi , mi maxli=1 mini ;
6: for all i = 1 to l do
7:
if mi > maxi then
8:
delete(pi );
9:
end if
10: end for
11: while num(W S) > 0 do
12:
for all i = 1 to l do
13:
P A getM axServices(pi );
14:
SP SP ComputeSkyline(P A );
15:
P artiExisted(pi );
16:
end for
17: end while

where ws.qmax denotes the largest value among all QoS


dimensions of ws and ws.dimmax is the corresponding dimension for ws.qmax . With this formula, services in the
same service class could be organized into different partitions

78

The Multi-Index algorithm is shown in Algorithm 1. In Algorithm 1, if the index on P does not exist, it will be generated
by generateIndex(P ) in step 2. Steps 4-9 begin the rst
pruning of the partitions whose maximum values are smaller
than mi. In steps 11-17, we select Web services with the maximum value that equals to ma using getM axServices(pi ),
store them in a separate partition P A , and eliminate the nonskyline services. At last, pi is checked whether to be pruned
away by P artiExisted(pi ), like steps 7-8. The loop repeats
until all services are processed. And then, top-k candidate
services can be selected according to formula (1). Its worth
to note that, Multi-Index maintains multiple indexes which
are most commonly used. As preferences of users in a same
business area are usually similar, the cost of maintenance is not
much. Furthermore, as there are some commercial databases
employing B + -tree structure as a indexing technique, we use
MySQL as the B + -tree infrastructure in this paper.

It is worth to note that, in the lattice of Fig. 4, the parentchild relationship indicates an order of computing skyline
services [7], which could avoid false positive skylines. However, the relationship is not equivalent to the dominance
relationship. For example, {ws12 , ws21 } is a child node of
{ws11 , ws21 }, while the parent node may not dominate the
child node. In order to store a service lattice, a max-heap is
employed. For the lattice in Fig. 4, a heap will be initialized
with the root node {ws11 , ws21 }. The enumerating process
begins with the root node and, in each step, repeats two substeps: 1) get the composite service cpi with the maximum
score; 2) generate the child nodes of cpi and insert them into
the heap. As different parent nodes may have a same child,
a parent table is employed to record the number of parents
that are not processed yet to avoid the node duplication [7].
For example, {ws12 , ws22 } is a child node of {ws12 , ws21 }
and {ws11 , ws22 }. When both parent nodes are scanned,
{ws12 , ws22 } could be inserted into the heap.

C. Global Optimization for Top-k Composite Services


Algorithm 2 Global Optimization Algorithm

In this section, we present an algorithm for selecting top-k


composite services, combined with a service lattice and a maxheap. We shall illustrate the algorithm through an example.
Example 4 Consider the example shown in Fig. 4. Suppose that there are two tasks t1 , t2 in a task schema, and
S1 , S2 denote their candidate service class respectively. S1 =
{w11 , w12 , w13 }, and S2 = {w21 , w22 , w23 }. For each task,
top-3 candidates are sorted according to the score in an
ascending order. Then, a service lattice is constructed by
enumerating all composite services progressively.


 



Require:
a user preference space P = {p1 , p2 , . . . , pl }, and top-k services
for each task, e.g., W Si = {ws1 , ws2 , . . . , wsk } for ti .
Ensure:
1: Initialize a parent table  and a heap T ;
2: while T =  do
3:
ni extractN ode(T );
4:
computeDomN umber(ni );
5:
updateDomN umber();
6:
N generateN odes(ni , T );
7:
for all nj N do
8:
(nj ) (nj )-1;
9:
if (nj ) == 0 then
10:
T.add(nj );
11:
end if
12:
end for
13: end while

 !

 

 

The service lattice, employed in this paper, is an extension


to Expansion Lattice [7]. We present two main extensions
here. First, in this paper, when calculating the score of a
service, QoS values are normalized in formula (1), which is
different from [7]. Second, in our method, during the scanning
of a service lattice, a dominant number-aware service ranking
is used, which selects a composite service with a larger
dominant number as a better service. In the phase of local
service selection, the score of a service is employed as the
criterion for selecting top-k candidate services, guaranteeing
the QoS of services. And then, in global optimization, for
selecting comprehensive composite services, the dominant
number is employed to rank services. The algorithm of the
global optimization is shown in Algorithm 2.
In Algorithm 2, T is a max-heap and is initialized with the
root node of a lattice.  is a parent table keeping the number of
parents that are not processed for a node. Steps 3-4 begin with
a node ni extracted from the heap, compare it with other processed services and compute its dominant number. Meanwhile,
dominant numbers of services having been processed are up-

 

 "

Fig. 4 An example of a service lattice for computing top-k


composite services
Global optimization begins with top-k candidate services
of each task, which is sorted in an ascending order. In order
to avoid multiple scanning of candidate services, a service
lattice is constructed, which is illustrated in Example 4. In this
lattice, each node corresponds to a composite service, and the
number of nodes is k k. The lattice begins with a root node
{ws11 , ws21 }, which is composed by the services ranking rst
in each service class. Child nodes are generated by replacing
one service with its successor, e.g., {ws11 , ws21 } generates
{ws12 , ws21 } and {ws11 , ws22 }. The process repeats until all
composite services are enumerated.

79

!

!






 

(a) Time with UPS-Num




"#$%##
&'(%##







"#$%##
&'(%##

!





"#$%##
&'(%##





!#)





(b) Time with Task-Num

(*)

(c) Time with Service-Num

Fig. 5 Performance of our method with two datasets


performance of our method. We use m to denote the number
of tasks, n the number of candidate services for each task, and
l the number of attributes in the preference space.
Prole 1: The number of services in UPS with the number
of preference attributes. Fig. 6 shows the number of skyline
services in UPS with respect to the number of attributes in user
preference space. In QWS dataset, the full skyline contains
more than 200 services while the numbers of 3-UPS and 4UPS are less than 50, which could reduce much time for multitask service composition. The statistical result on the random
dataset is shown in Figure 6(b), where the number of full
skyline services is much more than the number of 3-UPS and
4-UPS. TABLE III shows that 4772 services are in the full
skyline among 10,000 services, nearly reaching 50 percent,
while the 3-UPS (3d) and 4-UPS (4d) only contain 18 and
91 services respectively. Therefore, through emphasizing the
attributes in a preference space, we can enable the composition
process consume less time.

dated in the step 5 using updateDomN umber(). Then, child


nodes of ni are generated through generateN odes(ni , T ). In
the steps 7-10, if all the parent nodes of nj are processed, the
corresponding service is inserted into T . The process repeats
until T is empty. At last, k composite services with largest
dominant numbers will be returned to the user.
IV. E XPERIMENT
In this section, we present an experimental evaluation of our
method, measuring efciency, in terms of the execution time
from a user submitting a request to top-k composite services
returned. Different proles are analyzed, for nding the main
factors affecting efciency.
A. Experiment Setup
The experiment was evaluated on two datasets. The rst is
a public Web service dataset QWS1 , which comprises 9 QoS
attributes for 2507 real-world Web services. These services
were collected from public sources on the Web, including
UDDI registries, search enginies and service portals. More
details about this dataset can be found in [4]. The other
one is a synthetic dataset which is generated randomly on
a computer, for testing our method with large number of
services. The dataset contains 25,000 QoS tuples, and each
tuple represents the 10 QoS attributes of a Web service, where
different attributes are independent.
The method presented in Section III was implemented in
Java. For solving the B + -tree structure based Multi-Index
algorithm, MySQL database was employed, where we assumed
that all service information (e.g., QoS, URLs, WSDL les) for
a service broker is stored. The experiment was conducted on a
machine with the Pentium Dual-Core CPU E5800, 3.20GHz,
running Windows 7 (64-bit).

TABLE III
T HE NUMBER OF S KYLINE S ERVICES WITH T WO DATASETS
Dataset
Real
Random

4d
29
91

5d
64
253

6d
108
595








 +(,#

(a) Real Dataset

In this subsection, four proles are presented for analyzing


our method. First, in Prole 1, the motivation is veried
by counting the number of services in UPS with respect to
different number of preference attributes with two datasets.
Then, the other three proles are shown for analyzing the

7d
170
1391

8d
173
2309

9d
205
3251

10d
/
4772

(-)(*

(-)(*

B. Experiment Result and Analysis

1 http://www.uoguelph.ca/

3d
8
18

 +(,#

(b) Random Dataset

Fig. 6 The number of skyline services with two datasets


Prole 2: Performance with the number of preference attributes. In this prole, with the random dataset, n = 1000
and, with QWS, n = 500, where the values of m are both
5, and l varies from 3 to 7. The result is shown in Fig. 5(a),

qmahmoud/qws/index.html

80

then, a task schema is generated and each task is assigned


to a service broker, which is responsible for selecting top-k
candidate services of a task for further composition. Finally,
the system collects all returned services from brokers and
compute the top-k composite services. The consuming time of
our method consists of two parts: 1) selecting top-k candidate
services for each task, 2) collecting services from brokers and
computing top-k composite services.
In the rst part, as the B + -tree structure is employed and
indexes are constructed ahead, the time complexity is O(ln),
where l is the number of attributes in user preference space
and n denotes the number of candidate services for a task.
As service brokers work in parallel, the total time of local
service selection is O(l n). For the global optimization, the
consuming time is mainly affected by three factors: the number
of tasks m, the number of preference attributes l and the value
of k, which could be indicated from Fig. 5(a), Fig. 5(b) and
Fig. 7. The time complexity is O(m l k). Particularly,
in global optimization, only top-k services for each task is
returned, and the process is not affected by the number of
candidate services for each task.

where the consuming time grows with the increasing number


of attributes. Therefore, using UPS in our method, we could
reduce the processing time for service selection.
Prole 3: Performance with the number of tasks. In this
prole, m varies from 3 to 8, while n = 1000 for the random
dataset and n = 300 for the QWS. Fig. 5(b) shows the result,
where the consuming time of our method grows with the
increasing number of tasks. In particular, as shown in Fig.
5(b), when the task number exceeds 6, the consuming time
grows rapidly with the task number.




&. 
&. 
&. 
&. 

"#$%##
&'(%##

!

(-)(*










(*

(a) UPS with Service-Num




!/*#) )

(b) Time with k

Fig. 7 UPS with Service-Num and Time with k

B. Limitation Analysis
In Prole 2 and Prole 4 of Section IV-B, although the consuming time grows with the increasing number of candidate
services and the number of attributes in preference space, the
growth rate is low. However, in Prole 3, when the number of
tasks exceeds 6, the consuming time grows rapidly, which is
caused by the low performance of the service lattice employed
in global optimization whose complexity is O(m l k).
In future, we will try to improve the algorithm for global
optimization.

Prole 4: Performance with the number of candidate services. Fig. 5(c) shows the consuming time with respect to the
number of candidate services for each task. In this prole,
m = 5, l = 4, while n varies from 300 to 1000. With the
number of candidate services increasing, the consuming time
grows. But the growth rate is low, which may be concluded
from the observation that the consuming time is not always in
growth, but sometimes decreases.
Two more proles are also veried. The rst is the number
of skyline services in UPS with the number of all services,
shown in Fig. 7(a). In this prole, the task number is 5, and
the number of attributes in preference space could be 3, 5, 7
and 10, which are denoted by curves with different colors
respectively. Some observations could be got that: 1) with
the number of services growing, the number of services in
UPS increases; 2) the larger the number of attributes is, the
more rapidly the UPS number grows. The other prole is the
consuming time with respect to the value of k, which is shown
in Fig. 7(b). The larger k is, the more composite services the
system returns. In this prole, for the two datasets, the number
of candidate services for each task is 100, the task number is
5, while k varies from 5 to 10. The consuming time grows
rapidly with the value of k increasing.

C. Related Work and Comparison Analysis


Considering the expected large number of services competing to offer similar functionalities, quality-aware service
composition has received considerable attention nowadays [7],
which is considered to be a problem of Multiple Criteria
Decision Making (MCDM). On one hand, some researchers
transform the problem into the single objective optimization
with the Simple Additive Weighting (SAW) technique, e.g.,
[3], [13], [5] and [14]. In [3], the authors presented two methods to evaluate services and select the optimal Web service:
local optimization and global optimization. In local optimization, services satisfying the local constraints were evaluated
and selected for each task while, in global optimization, the
optimal execution plan satisfying the global constraints was
evaluated and selected. Hybrid approaches were also presented
in [13] where, rstly, the global constraints were decomposed
into the local constraints; and then local service evaluation and
selection for each task were performed in distributed brokers.
And, in [5], the service selection problem was modeled as
a multi-dimension, multi-choice knapsack problem (MMKP)
and a multi-constraint optimal path (MCOP) problem. And
efcient heuristic algorithms were also proposed to nd a nearoptimal solution. Nevertheless, with these approaches, it is

V. P ERFORMANCE E VALUATION
In this section, we present the complexity and limitation of
our method, elaborate the related works on service selection
and skyline, and show an comparison analysis.
A. Complexity Analysis
The distributed system prototype, shown in Fig. 3, was
implemented for our method. Using this prototype, rst, a user
submits functional requirements and preference attributes. And

81

ACKNOWLEDGEMENT

a rather challenging task for a user to transform preferences


into numeric weights, which may limit the popularity of these
approaches.
On the other hand, some researchers try solving the problem
using Skyline (Pareto), which returns some uncompatible services, not dominated by each other, e.g., [6], [7], [8]. In [6], the
author presented an approach to automatically compose data
Web services, taking into account user preferences modelled
by fuzzy sets, and the services were ranked by a fuzzication
of Pareto dominance. The concept composite service skyline
was presented in [7], and a novel bottom-up computation
framework was proposed enabling the skyline algorithm to
scale well with the number of candidate services. In [8], a
hybrid approach combining mixed-integer programming and
skyline techniques was proposed, where, rst, skyline was
used for ltering dominant services, and then, mixed-integer
programming was employed to select the best composite
service for a user.
Skyline queries came from some old topics like contour
problem, maximum vectors, and convex hull [10], and was
introduced into database community in [15]. Some skyline
computation methods and variants were proposed recently,
e.g., [16], [10], [17] and [18]. In [16], the author abstracted
personalized skyline ranking as a dynamic search over skyline
subspaces guided by user-specic preference, however, the
cost to maintain the skyline subspaces was expensive, which
is not adaptable for service selection. In [10], the k-dominant
skyline was presented to alleviate the effect of dimensionality
curse on skyline query in high dimensional spaces. Although
the k-dominant skyline could reduce the number of candidate
services, the dimensions used in the computation are variant,
which is also not applied for service composition. In order
to generate online response for any such preference issued
by a user, a semi-materialization method was presented in
[17], which stored some useful partial results corresponding
to certain implicit preferences. Although these techniques are
proposed for solving the curse of dimensionality in database
community [10], [16], the techniques are not applicable for
service composition. Therefore, combining QoS-aware service
composition and employing the denition of UPS, we propose
a method for selecting top-k composite Web services.

This paper is partly supported by the National Science


Foundation of China under Grant No. 61021062, 61073032
and 60736015, and Jiangsu Provincial NSF Project under
Grants BE2011171.
R EFERENCES
[1] Michael P. Papazoglou. Web Services: Principles and Technology, Prentice Hall, 2008.
[2] Michael Rosen, Boris Lublinsky, Kevin T.Smith, Marc J.Balcer. Applied
SOA: Service-Oriented Architecture and Design Strategies, Wiley Publishing, 2008.
[3] Liangzhao Zeng, Boualem Benatallah, Anne H.H. Ngu, Marlon Dumas,
Jayant Kalagnanam, Henry Chang. QoS-Aware Middleware for Web
Services Composition, IEEE Transactions on Software Engineering, vol.
30, no. 5, pp. 311-327, 2004.
[4] Mohammad Alrifai, Thomass Risse. Investigating Web Services on the
World Wide Web, Proceedings of the 17th International Conference on
World Wide Web, pp. 795-804, 2008.
[5] Tao Yu, Yue Zhang, Kwei-Jay Lin. Efcient Algorithms for Web Services
Selection with End-to-end QoS Constraints, ACM Transactions on the
Web, vol. 1, no. 1, 2007
[6] Karim Benouaret, Djamal Benslimane, Allel Hadjali, Mahmoud
Barhamgi. Top-k Web Service Compositions using Fuzzy Dominance
Relationship, Proceedings of the 2011 IEEE International Conference
on Services Computing, pp. 144-151, 2011.
[7] Qi Yu, Athman Bouguettaya. Efcient Service Skyline Computation for
Composite Service Selection, IEEE Transactions on Knowledge and Data
Engineering, 2011.
[8] Mohammad Alrifai, Dimitrios Skoutas, Thomas Risse. Selecting Skyline
Services for QoS-based Web Service Composition, Proceedings of the
19th International Conference on World Wide Web, pp. 11-20, 2010.
[9] Xuemin Lin, Yidong Yuan, Qing Zhang, Ying Zhang. Selecting Stars:
The k Most Representative Skyline Operator, Proceedings of the 23rd
International Conference on Data Engineering, pp. 86-95, 2007.
[10] Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H. Tung,
Zhenjie Zhang. Finding k-Dominant Skylines in High Dimensional Space, Proceedings of the 2006 ACM SIGMOD International Conference
on Management of Data, pp. 503-514, 2006.
[11] Yutu Liu, Anne H.H. Ngu, Liangzhao Zeng. QoS computation and
policing in dynamic web service selection, Proceedings of the 13th
International World Wide Web conference on Alternate track papers
posters, pp. 66-73, 2004.
[12] Kian-Lee Tan, Pin-Kwang Eng, Beng Chin Ooi. Efcient Progressive
Skyline Computation, Proceedings of the 27th International Conference
on Very Large Data Bases , pp. 301-310, 2001.
[13] Mohammad Alrifai, Thomass Risse. Combining Global Optimization
with Local Selection for Efcient QoS-aware Service Composition,
Proceedings of the 18th International Conference on World Wide Web,
pp. 881-890, 2009.
[14] Anja Strunk. QoS-Aware Service Composition: A Survey, Proceedings
of the 2010 IEEE European Conference on Web Services, pp. 67-74,
2010.
[15] Stephan Brzsonyi, Donald Kossmann, Konrad Stocker. The skyline
Operator, Proceedings of the 17th International Conference on Data
Engineering, pp. 421-430, 2001.
[16] Jongwuk Lee, Gae-won You, Seung-won Hwang. Personalized Top-k
Skyline Queries in High-dimensional Space, Information Systems, vol.
34, no. 1, pp. 45-61, 2009.
[17] Raymond Chi-Wing Wong, Ada Wai-Chee Fu, Jian Pei. Efcient Skyline Querying with Variable User Preferences on Nominal Attributes,
Proceedings of the VLDB Endowment, pp. 1032-1043, 2008.
[18] Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming Liang. Skyline
with Presorting, Proceedings of the 19th International Conference on
Data Engineering, pp. 717-719, 2003.

VI. C ONCLUSION
In this paper, we propose a method to select top-k composite
services with various user preferences. First, a preferenceaware service dominance is presented and, based on the denition, we present the concept User Preference-aware service
Skyline (UPS). In order to compute the UPS dynamically in a
service broker, a Multi-Index algorithm is proposed. We also
propose a dominant number-aware service ranking, which is
combined with a service lattice and a max-heap for the global
optimization. At last, an experiment was conducted with two
datasets to verify our method. An interesting future direction
is to investigate ways to improve the overall performance of
global optimization process.

82