Académique Documents
Professionnel Documents
Culture Documents
Super-peer Networks
Rozlina Mohamed 1, 2 and Christopher D. Buckingham1
1
Aston University, Computer Science Department, B4 7ET, UK
2
University Malaysia Pahang, Software Engineering Department, 15150 Malaysia
In this paper, we present the construction of location-based schema caching for query routing at the client-peer in super-peer
networks. This cached information is used for directly routing subsequent repeated queries towards their actual resource locations
without going via the super peer. Instead of caching the previous query and its result, our proposed approach caches the query with
the routing direction, which is the resource location of the previous queries. This means outdated results are avoided. The paper
describes the main processes, including details of the algorithm required for constructing the cached information.
Index Terms— query caching, query rewriting, query routing, super-peer network, peer-to-peer systems
I. INTRODUCTION its routing management peer information and query routing.
Peertopeer (P2P) networks are most popularly used as Suppose the clientpeer, pi wants to find some target data. It
file sharing applications, with Gnutella , Napster and Kazaa sends a query to the superpeer, spi, to request the target
among the most famous. Alternative P2P systems include data. Then the superpeer forwards this query message to
instant messaging, collaborative computing, distributed another clientpeer pj if the superpeer knows the target data
computing, and platform applications. In P2P systems, a is obtained and shared by pj. Otherwise, the superpeer
huge number of computers are typically interconnected forwards the query request to other superpeers connected to
using Internet Protocol (IP) network numbering, lying on it and contributes to the potential for flooding the system
top of physical computer networks. Networking between with query messages, which consume CPU resources and
these computers determines the structure (also known as bandwidth of peers.
network topology) for the whole computers in a system. In a The query forwarding process is widely known as query
broad sense, there are three types of P2P systems: pure, routing, which is reduced in the superpeer network model
centralized and superpeer. This classification is based on such as in , because the superpeer can manage its client
the degree of decentralization in processing tasks and peers and index information; the query routing would go to
sharing resources among participants in the network. We are superpeers that have indexed the target data even though the
focusing on the superpeer network model, which is target data are actually obtained from the clientpeers. In
classified as optimized decentralized P2P systems. this case, the number of routing messages transmitted in the
The superpeer network consists of superpeer and client network is reduced.
peer nodes, with nodes decomposed into clusters. For each The authors have proposed additional preprocessing for
cluster, there are one or more nodes selected to be the super query routing in order to improve further the query routing
peer for managing a group of clientpeers. Our discussion in in superpeer networks. The preprocessing is by using a
this paper is focusing on the responsibility of superpeers for schema cached list (SCL) at the client peer that enables it to
query routing, although we are aware that they do have associate the query with data locations and thus route the
additional roles. query directly to the peers without going through the super
Typically, if a new clientpeer, pi, wants to join the peer. In this paper, we discuss the maintenance of our SCL
network, it first has to send its information to a superpeer as part of the preprocessing mechanism for query routing.
spi. Then, the superpeer inserts the peer’s information into
In addition, we identify several key elements that are peer p5. This sequence of routing requests is shown in Figure
required for query caching. 2. In Figure 3, we illustrate acknowledgement messages sent
by the owners of data to p2. These are followed by query
II. BACKGROUND & RELATED WORKS
messages sent by p2 to the data owners. Remember that the
The superpeer network is considered an efficient P2P original ‘ab’ query has been decomposed into subqueries
network model for searching query results . One of the key ‘a’ and ‘b. Once the data owner has processed the subquery,
reasons is because of the routing index and query routing the results are returned to p2 for further query result
facilities providing by the superpeer for queries posted by manipulation.
its clientpeers. In superpeer network systems, clientpeers
are connected to their local superpeer to upload their shared
data and request query routing directives. Figure 1 illustrates
the scenario, where the client peer, p2, queries for data ‘a’ 3. 3.
and ‘b’. Data ‘a’ is located at clientpeers p1, p3 and p4 while 3 4
data ‘b’ is located at clientpeer p5. Clientpeers p3 and p4
are connected to superpeer sp2 while client peer p5 is 4
3.
connected to superpeer sp3. 2.
2
2.
p4 b p5
p3
a
a sp3
sp 2
Request message for ‘a’
Request message for ‘b’
sp1
Consult superpeers’ index
Suppose a query is given to clientpeer p2 for information
about ‘a’ and ‘b’. In a conventional superpeer network
system, the routing process is started by a request message
sent by p2 to its superpeer sp1 as illustrated in Figure 2.
Then sp1 consults its index to find peers with the required
information and, in this example, sends p2 the routing
directions for ‘a’ and ‘b’. Based on these, the message is
routed to p1 and sp2 then rerouted to p3, p4 and sp3. The
message request for ‘b’ is then rerouted by sp3 to its client
information is located at the superpeer. Thus, the client
peers would be able to retrieve the query result just by
sending a query to the superpeer.
Instead of implementing schemabased caching for the
previous query and its result at the superpeer, He et al. and
Fegaras et al. have proposed an actual databased indexed
model in . Rather than have a routing index, the superpeer
in not only obtains the schema but also the data that
belongs to its clientpeers.
In brief, the superpeer in the above mentioned
approaches has the ability to answer the query based on
Acknowledge for ‘a’ existing query results that have been cached. Furthermore,
Query & retrieval for the superpeer is not only responsible for obtaining the
‘a’ indexes of data locations but also for processing the query
Acknowledge for ‘b’ result. Meanwhile, the superpeer node itself is not so
scalable and likely to be a singlepoint of failure for the
Figure 3. Acknowledge message, query and result clientpeers within its cluster .
retrieval in conventional superpeer network A fault tolerance module for superpeer failure has been
From the above presented superpeer network scenario, considered in the P2PDIET project as presented in . In P2P
the clientpeer will send its query to the locations that have DIET, a fast encoding peer profile is indexed by the client
been determined by the superpeer. The routing request peer for adhoc query processing. Therefore, clientpeers
message by p2 is restricted by its timetolive (TTL) value would be able to get their query routing directives without
that has been set. If the TTL is exceeded before reaching sp3, being fully dependent on the superpeer. However, we
its descendant peer that possesses data ‘b’ will be excluded believe that replicating the routing index from the superpeer
from the message routings / propagation. However, setting a is not worthwhile, due to limited capabilities of clientpeers.
high TTL value is not worthwhile due to increasing the Additionally, the probability of the whole index usage is not
query routing and a risk of causing network congestion . reported by the authors.
This risk is reduced if the amount of message routing in the If the resources of the database are updated, the use of
network is decreased, with a concomitant reduction in the cached query results may lead to retrieval of outdated
number of querying peers and transmitted messages . information for subsequent queries . Thus, results produced
Furthermore, query mistakes can be avoided. by the superpeer may be obsolete. Therefore, as an
Assisted query routing has been widely accepted since the autonomous data provider in the P2P environment, the most
introduction of superpeer networks and more recently, uptodate results should come from the actual peers’
caching strategies have received significant attention . We resources. These resources can be retrieved by requesting
separate caching strategies into two approaches (i) caching their owners directly. Thus, instead of caching the query and
the actual data, and (ii) caching the routing direction its result, Quan et al. have proposed a query hit message
towards the location of that data. caching approach . The query hit message is cached for
Caching the actual data is similar to the materialized subsequent query request. Therefore, the subsequent query
views approach that has been implemented in federated request for the same schema is redirected to the replicated
database systems . In P2P, this concept is adapted in and data instead of every query request being routed to a single
Brunkhorst & Dhraief have proposed a model of semantic resource. On the other hand, schemabased query caching is
caching . Semantic caching is used to cache the query and proposed by Doulkeridis et al. for assisting query forwarding
its result based on a schema. This schemabased cached . In , the schema of content located at remote peers is
cached. Thus, the subsequent query can be directly routed to backbone of the P2P network and to facilitate the network of
the resource location. However, these approaches are for P2P connections. In addition, the superpeer is a dedicated
pure P2P network, which route queries differently to super server in the cluster for processing messages. Thus, the
peer networks. In the context of superpeer networks, query index of resource locations which is used for assisting the
caching at the clientpeer has so far not been instigated. routing of messages is maintained by the superpeer.
There is some work on caching in superpeer networks, but Therefore, the query message is routed according to this
mainly associated with query caching on superpeers, not index. Hence, the superpeer is also responsible for query
peers. processing on behalf of its respective clientpeers.
A. SCL Locality
Figure 4. Illustration of our proposed routing request
In order to appreciate our locationbased SCL as the pre
processing for query routing, it is expedient to look at
2) Client-peer
different peer types and their characteristics. Here, we
The clientpeer can be seen as either a consumer that
distinguish two types of peer nodes in super peer networks.
searches for resources or a data provider. The clientpeer is
1) Super-peer
The superpeer node is a hub for several peer nodes in a fullydependant on the superpeer for query routing. The fact
cluster. In this research we assume that only one node is that the superpeer may fail or suddenly leave the network
assigned as a superpeer for a cluster of peers, even though means that a fault tolerance module for superpeer failure is
some research work that has been done on having multiple required to be obtained by the clientpeer . In our adapted
superpeers . The superpeer is used to maintain the superpeer network, we are proposing that the clientpeer
obtains the capability of locally deciding the query routing cache policies are based on access order, recency, and
direction, instead of being fully dependant on the superpeer. frequency. LRU is based on the temporary locality principle,
Moreover, local query routing facilities would be able to where items which have not been used for the longest time
reduce the superpeer workload, as well as greatly reducing are replaced when the initial space that has been specified is
the number of messages being routed in the network and the exceeded. LRU is chosen because of good and stable
number of queried peers needed for retrieving the query performance of file access times .
result .
C. Answering Queries Using SCL
B. SCL Data Structures Locationbased schema caching simply stores resource
The role of the SCL is to provide the routing direction for locations that have been extracted from previous queries for
every query request from a local clientpeer. In order to routing local subsequent queries. The stored resource
achieve the correct and efficient execution of requests, each location is decomposed into several attributes which are the
SCL manages two data structures. One of them is to store IP address, port number, filename and path that are linked to
the schema that it supports and the other one is to store the particular schema supported by the local SCL. For our
resources location information for particular schema, called approach, we assume that the query is only conjunctive, that
schemaTable and resourceTable respectively. A view of is ForLetWhereReturn (FLWR) clauses in XQuery
these data structures is shown in Figure 5. grammar. In order to answer the incoming query, the client
peer has to do some routing preprocessing.
schemaTable resourceTabl
The local routing preprocessing consists of decomposing
a conjunctive query that consists of several expressions into
a single expression. Here, we assume that each expression is
required for one schema or a subgoal of the query. Then,
each expression is matched against the schema cached in the
k List of SCL. If the matched schema is found, the resource location
is identified. Based on the resource location, a single
k Resource location subquery is then generated. Thus, this rewritten
subquery is routed to a specified location.
REFERENCES