Vous êtes sur la page 1sur 17

Slides for Chapter 10:

Peer-to-Peer Systems

From Coulouris, Dollimore, Kindberg and Blair


Distributed Systems:
Concepts and Design
Edition 5, Addison-Wesley 2012
Figure 10.1: Distinctions between IP and overlay routing for peer-to-
peer applications

IP Applicationlevelroutingover lay

Scale IPv4islimitedto232addressable nodes.The Peertopeersystemscanaddress moreobjects.
IPv6namespaceismuchmore generous TheGUIDnamespaceisverylarge andflat
(2128),butaddressesinbothversionsare (>2128),allowingittobemuchmore fully
hierarchicallystructureda ndmuchofthespace occupied.
ispreallocatedaccordi ngtoadministrative
requirements.
Loadbalanc ing Loadsonroutersaredetermin edbynetwork Objectlocationscanbera ndomizedandhence
topologyandassociatedtrafficpatterns. trafficpatternsaredivorcedfromthenetwork
topology.
Networkdynamics IProutingtablesareupdatedasy nchronouslyonRoutingtablescanbeu pdatedsynchronouslyor
(addition/deletionof abesteffortsbasiswithtimeconstantson the asynchronouslywithfractionsofasecond
objects/nodes) orderof1hour. delays.
Faulttolerance RedundancyisdesignedintotheIPnetworkby Routesandobjectrefer encescanbereplicated
itsmanagers,ensuringtoleran ceofasingle nfold,ensuringtoleran ceofnfailuresofnodes
routerornetworkco nnectivityfailure.nfold orconnections.
replicationiscostly.
Targetidentificatio n EachIPaddressmapstoexactlyonetarget Messagescanberout edtothenearestreplicaof
node. atargetobject.
Securityand anonymity Addressingisonlysecu rewhenallnodesare Securitycanbeachiev edeveninenvironments
trusted.Anonymityfortheownersof addresses withlimitedtrust.Alimiteddegreeof
isnotachievable. anonymitycanbeprovided.

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.2: Napster: peer-to-peer file sharing with a centralized,
replicated index

peers

Napster server Napster server


Index 1. File location Index
request
3. File request
2. List of peers
offering the file
5. Index update
4. File delivered

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.3: Distribution of information in a routing overlay

As routing knowledge Ds routing knowledge

A
D

Object:
Bs routing knowledge Cs routing knowledge
Node:

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.4: Basic programming interface for a distributed hash table
(DHT) as implemented by the PAST API over Pastry

put(GUID,data)
Thedataisstoredinreplicasatallnodesresponsiblefortheobject
identifiedbyGUID.
remove(GUID)
DeletesallreferencestoGUIDandtheassociateddata.
value=get(GUID)
ThedataassociatedwithGUIDisretrievedfromoneofthenodes
responsibleit.

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.5: Basic programming interface for distributed object location
and routing (DOLR) as implemented by Tapestry

publish(GUID )
GUIDcanbecomputedfromtheobject(orsomepartofit,e.g.its
name).Thisfunctionmakesthenodeperformingapublish
operationthehostfortheobjectcorrespondingtoGUID.
unpublish(GUID)
MakestheobjectcorrespondingtoGUIDinaccessible.
sendToObj(msg,GUID,[n])
Followingtheobjectorientedparadigm,aninvocationmessageis
senttoanobjectinordertoaccessit.Thismightbearequestto
openaTCPconnectionfordatatransferortoreturnamessage
containingallorpartoftheobjectsstate.Thefinaloptional
parameter[n],ifpresent,requeststhedeliveryofthesame
messagetonreplicasoftheobject.

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.6: Circular routing alone is correct but inefficient
Based on Rowstron and Druschel [2001]

0 FFFFF....F (2128-1) The dots depict live nodes. The


space is considered as circular:
node 0 is adjacent to node (2128-
1). The diagram illustrates the
D471F1 routing of a message from node
65A1FC to D46A1C using leaf set
D467C4 information alone, assuming leaf
D46A1C sets of size 8 (l = 4). This is a
degenerate type of routing that
would scale very poorly; it is not
used in practice.

D13DA3

65A1FC

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.7: First four rows of a Pastry routing table

The routing table is located at a node whose GUID begins 65A1. Digits are in hexadecimal. The ns represent [GUID, IP address] pairs specifying the next
hop to be taken by messages addressed to GUIDs that match each given prefix. Grey- shaded entries indicate that the prefix matches the current GUID up
to the given value of p: the next row down or the leaf set should be examined to find a route. Although there are a maximum of 128 rows in the table, only
log16 N rows will be populated on average in a network with N active nodes.

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.8: Pastry routing example Based on Rowstron and Druschel [2001]

Routing a message from node 65A1FC to D46A1C.


With the aid of a well-populated routing table the
0 FFFFF....F (2128-1) message can be delivered in ~ log 16 (N ) hops.

D471F1

D467C4
D46A1C D462BA

D4213F

D13DA3

65A1FC

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.9: Pastrys routing algorithm

Tohandleamessage MaddressedtoanodeD(whereR[p,i]istheelementatcolumn i,
rowpoftheroutingtable):
1. If(L l<D<Ll){//thedestinationiswithintheleafsetoristhecurrentnode.
2. ForwardMtotheelementL ioftheleafsetwithGUIDclosesttoDorthecurrent
nodeA.
3. }else{//usetheroutingtabletodespatch MtoanodewithacloserGUID
4. findp,thelengthofthelongestcommonprefixof DandA.andi,the( p+1)th
hexadecimaldigitofD .
5. If(R[p,i]null)forwardMtoR[p,i]//routeMtoanodewithalongercommon
prefix.
6. else{//thereisnoentryintheroutingtable
7. ForwardMtoanynodeinLorRwithacommonprefixoflengthi,buta
GUIDthatisnumericallycloser.
}
}

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.10: Tapestry routing From [Zhao et al. 2004]

4377 (Root for 4378)


Tapestry routings
for 4377

43FE 437A
publish path
4228
4361
Location mapping 4378
Phils 4664
for 4378
Books
4B4F 4A6D
Routes actually
taken by send(4378) E791 4378
57EC AA93 Phils
Books
Replicas of the file Phils Books (G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node
for object 4378. The Tapestry routings shown are some of the entries in routing tables. The publish paths show
routes followed by the publish messages laying down cached location mappings for object 4378. The location
mappings are subsequently used to route messages sent to 4378.

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.11: Structured versus unstructured peer-to-peer systems

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.12: Key elements in the Gnutella protocol

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012 1
3
Figure 10.13: Storage organization of OceanStore objects

AGUID

VGUID of current
certificate
version

version i+1
VGUID of

BGUID (copy on write)


version i
d1 d2 d3

root block

version i indirection blocks

data blocks d1 d2 d3 d4 d5
Version i+1 has been updated in blocks d1,
d2 and d3. The certificate and the root
VGUID of version i-1
blocks include some metadata not shown.
All unlabelled arrows are BGUIDs.
InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.14: Types of identifier used in OceanStore

Name Meaning Description

BGUID blockGUID Securehashofadatablock

VGUID versionGUID BGUIDoftherootblockofaversion

AGUID activeGUID Uniquelyidentifiesalltheversionsofanobject

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.15: Performance evaluation of the Pond prototype emulating NFS

LAN WAN Predominant


operationsin
Phase LinuxNFS Pond LinuxNFS Pond benchmark

1 0.0 1.9 0.9 2.8 Readandwrite

2 0.3 11.0 9.4 16.8 Readandwrite

3 1.1 1.8 8.3 1.8 Read

4 0.5 1.5 6.9 1.5 Read

5 2.6 21.0 21.5 32.0 Readandwrite

Total 4.5 37.2 47.0 54.9

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012
Figure 10.16: Ivy system architecture

Ivy node

DHash server
Application Application

DHash server

Ivy server DHash server

DHash server

Modifled
NFS Client
module DHash server
Kernel

InstructorsGuideforCoulouris,Dollimore,KindbergandBlair,DistributedSystems:ConceptsandDesignEdn.5
PearsonEducation2012